How Emergent founder Craig Silverman is using data to hunt down online hoaxes

If anyone can claim to be an expert in online rumors, falsehoods and fakes, it would have to be Craig Silverman, who has written both a book and a blog called Regret The Error and is now a fellow at Columbia University’s Tow Center for Digital Media. But Silverman doesn’t just want to write about online fakery, he wants to help stamp it out, and in order to do so he has launched a data-driven tracker called Emergent, which follows and debunks online hoaxes of various kinds.

Silverman is a journalist and former managing editor of PBSMediaShift, as well as a founder of OpenFile, a pioneering Canadian effort at crowdsourced local news (in the interests of disclosure, he is also a friend). I was interested in what he was up to with Emergent, so I called him up and asked him why he started it and how it works.

As part of his fellowship with the Tow Center, Silverman said he was planning a research paper about the state of online verification and some of the difficulties inherent in stamping out rumors, something he has written about for more than a decade now. But he said he wanted to do more than just write about the problem — he wanted to be able to give journalists better tools that they could use to determine what was true and what wasn’t.

I could have just looked at some of the people doing this kind of work, and some of the research into the psychology of rumors — there’s definitely enough there for an interesting paper. But I thought I would just be treading the same path that many have already, and I wanted to be able to say here’s something that works, and that requires data… so I decided to build this project and get some.

Trying to gauge truthiness

Although it stemmed from his work with the Tow Center, Silverman said he is funding the development of Emergent himself — both the back-end database and programming work, for which he used journalist and developer Adam Hooper, and the front-end website design and development work, which was done by a Toronto-based firm called Normative. The site is still in beta, he said, but new features are being added regularly and he is looking for feedback.


So how does Emergent work? Silverman and a research assistant comb through social media and news websites using a variety of feeds, alerts and filters, and then enter claims that need debunking into the database and assign what Silverman calls a “truthiness” rating that marks each report as supporting the claim (i.e. stating it to be true), debunking it in some way or simply repeating it.

At that point, an algorithm takes over, and watches the URLs of the stories or posts that Silverman and his assistant entered into the database to see whether the content has been changed — that is, updated with a correction or some evidence that suggests it’s true or false. If there’s enough evidence, the status of the claim is changed, but that decision is always made by a human.

Tracking the spread of rumors

The software also tracks and displays how much each story is shared on various social-media platforms such as Twitter and Facebook, and in the vast majority of cases the stories that tend to support the claim are far more widely shared than the ones debunking it. This is just the nature of human behavior, Silverman says — a report that a woman has had a third breast implanted is always going to be much more interesting than a story saying those reports are false.


Silverman said that there’s no question media sites are publishing more unverified or questionable reports (recent examples on Emergent included stories about North Korean dictator Kim Jong-Un breaking his ankles because he is too fat). Although some of this is driven by a desire for traffic, he said, it’s also driven by the fact that these kinds of stories are being shared a lot on social media, and so some news sites feel as though they might as well write about them, even if they are untrue.

My hypothesis is that because of how everyone is connected now, information that might have only had a small audience before can now get a much larger one. The problem existed before, but the distribution and velocity of that kind of information is much greater than it was before — and media organizations are choosing to publish a lot more of it than they would have before because it’s already out there.

Giving journalists better tools

This tension was highlighted at Gawker Media last year, during a discussion between Neetzan Zimmerman — who at the time was one of the largest drivers of traffic in Gawker history, thanks to his coverage of “viral” stories — and founder Nick Denton about whether Gawker should worry about how true a report is before it writes about it. As Zimmerman pointed out, many people don’t care whether it’s true or not, and are happy to share it with their friends regardless.

But Silverman believes if there are better tools for tracking such reports, journalists and others might share them less. There have always been sites like — founded by a husband-and-wife team in 1995 — and more recent efforts such as Gawker’s Factually blog. But there hasn’t been much data about where and when such hoaxes are being shared.

There are two things that the site doesn’t have that he wants to add in the future, Silverman said: More details about the factors that caused him to rank a report as supporting or debunking a claim, and some way for readers to submit their own links to articles that can help prove or disprove a report — in other words, a crowdpowered approach like the one that Grasswire uses for its fact-checking. There may be more such tools now, Silverman said, but there are also more hoaxes and rumors than ever, and more outlets willing to print them.

Featured image by Brian A Jackson/Shutterstock