Big data as a tool for detecting (and punishing?) bullies

We already know how powerful techniques such as machine learning and sentiment analyis can be when it comes to deciphering consumer behavior online, and now it seems they can identify bullies, as well. A group of University of Wisconsin researchers have developed a machine learning algorithm that’s identifying more than 15,000 tweets per day relating to bullying — complete with loads of associated sociological insights — which begs the question of how to act on that data. How do you govern a social web that can be simultaneously a communication platform, a research lab full of unknowing subjects and a boiling-over pot of criminal evidence?

How the model works and what it found

In order to train their model, the researchers fed it two sets of tweets — one they had determined to be about bullying activity and another that was not. Once the model had learned the language identifiers of tweets relating to bullying, it was time to turn it loose on real-world tweets. Not only did the system start identifying a great number of tweets, but it also discovered time patterns (they occur most frequently during the school week) and was able to pick out who played what role in the bullying.

In terms of demonstrating the power of big data, the latter might be the most interesting part because it actually uncovered an entirely new insight into the sociology of bullying. Not only are there the long-known roles of bully, victim, accuser and defender, but the researchers found that on social media, at least, there’s also the reporter. That’s someone who witnessed or heard about a bullying incident, but wasn’t involved, yet still commented on it.

Going forward, the team wants to add a sentiment analysis capability to its model so it can determine how individuals’ feelings are actually affected by bullying. It also wants to track bullies and victims over time, something not possible in traditional social science surveys that typically involve one-off interviews with children who know they’re being listened to. Just like with companies trying to uncover the causes of certain consumer trends, the researchers can follow groups of children over time via Twitter, trying to detect how and why they do what they do, and perhaps how relationships evolve.

Is Twitter a research lab, or an always-on wiretap?

The bigger policy and, really, ethical question that comes out this type of research is how we act upon it. The easy (or at least non-controversial) course of action — and something the researchers already suggest — is giving policymakers better data on which to base legislation or other efforts to prevent and punish bullying. Hopefully, they’ll listen. As I explained last week, the results of studies such as this one can provide valuable insights on which to base public policy, not just back up someone’s predetermined stance on an issue.

A more-controversial result of this research — especially if it goes on for an extended period of time — is how and when to actually intervene. As researcher Jerry Zhu says about the addition of sentiment analysis in the press release highlighting the study, “The idea is that if someone is powerfully affected by the event, if they are feeling extreme anger or sadness, that’s when they could be a danger to themselves or others. Those are the ones that would need immediate attention.”

Morally, this seems like the right thing to do, but it opens up a whole can of worms legally. How accurately can a machine learning algorithm actually determine feelings that might lead to physical harm? What level of intervention is actually allowable or advisable, and who should do it — the researchers, parents, the police? Where’s the line between what’s worth intervening on and what’s not, especially when we’re talking about mandatory reporters and potential harm to children? Will Twitter be forced to turn over user data without a search warrant?

Given the power such algorithms and the relative low cost of computing, it wouldn’t be a stretch to see schools or law-enforcement agencies start monitoring social media sites themselves to detect incidents of bullying, or maybe even child abuse. And although most tweets are publicly visible, a great deal of Facebook (s fb) data happens behind that platform’s guarded walls. Could lawmakers demand companies like Facebook actively monitor their sites for these types of messages, or at least give agencies’ access to the social activity of minors?

Big data techniques are already used to fight crime, but this is a lot different than predicting where criminals will strike or which convicts are likely to reoffend. I’d certainly want to know if my daughter were being bullied, but I’m not sure I’d to find out from some strange researchers, or Twitter or the police. If my daughter were accused of being a bully, I’d fight tooth and nail to discredit the accuracy of the “evidence” and the legality of the monitoring. And I’m not sure anyone could live with themselves if they knew something was happening but didn’t act until it was too late.

I don’t know the answers to any of these questions, and I don’t think anyone really does right now. But with more data than ever available about what’s going on in people’s lives, combined with cutting-edge analytic techniques and technologies, and our society’s seeming determination to raise our children in a sterile world with bumpered walls, we might have to answer these questions sooner rather than later.

Feature image courtesy of Shutterstock user Stocksnapper; bully image courtesy of Shutterstock user MANDY GODBEHEAR.