Lessons in how to crowdsource journalism from ProPublica

The idea of “crowdsourcing” has become more or less mainstream by now — thanks in part to the rise of social apps and services like Mechanical Turk and Kickstarter — and we’ve already seen how journalists can use Twitter and other social networks to crowdsource breaking news during events such as the Arab Spring uprisings in Egypt and elsewhere. But there haven’t been that many large-scale, organized efforts that qualify as crowdsourced journalism, which is what makes the “Free The Files” project from ProPublica so fascinating. Not only does the project have a tangible and arguably important social goal, but it would literally not have been feasible for any news organization to undertake without crowdsourcing.

In a nutshell, the project is designed to aggregate information about spending by non-profit groups and super-PACs (which in many cases are set up for that exact purpose) on political TV ads in various regional markets across the United States in the run-up to the federal election. This so-called “dark money” spending is an important element in the behind-the-scenes lobbying that goes on in any election, ProPublica argues, and so it is arguably in society’s interest to make as much of it public as possible — a classic case of an investigative journalism project with a public-spirited goal.

The “Free The Files” effort actually began in March, after TV industry executives pushed back against a proposed law that would force them to file reports on ad spending with the Federal Communications Commission. In the initial version of the project, ProPublica asked for volunteers to physically visit the offices of network affiliates to request the documents — and it did some of its own digging to kick things off, using students from the Medill School of Journalism at Northwestern University.

Marshalling an army of journalistic volunteers

In April, the FCC started requiring TV networks to file spending reports online, but with a crucial caveat: stations don’t have to submit them in machine-readable format, and in some cases they are PDFs, making it difficult to extract key pieces of information. So the “Free The Files” project took on a new mission — to get volunteers to look at as many files as possible and extract the key pieces of data (ad buyer’s name, amount spent, etc.). ProPublica created a Facebook app that makes it easy to do this, and now it has partnered with The Huffington Post to try and reach even more volunteers.

On a related note, the Sunlight Foundation and the non-profit group Free Press are also working on a similar project called “Political Ad Sleuth,” which asks volunteers to help by going to stations that are not required to file their documents online (a rule that only applies to broadcasters in the top 50 markets in the country) and copy the information from those documents physically.

According to an update from ProPublica engagement editor Amanda Zamora, since the ad-spending files started appearing online in August, more than 400 volunteers have unlocked information from TV stations in 33 crucial swing states. The project has found evidence of “dark money” spending by several non-profits, including one in Ohio that told the Internal Revenue Service it wasn’t going to spend any of its money on political causes and wound up spending over $1 million on ads attacking a Senate candidate. In total, the “Free The Files” effort has turned up evidence of more than $200 million in ad spending.

More than anything else, the ProPublica effort reminds me of what is still one of the most successful journalistic crowdsourcing projects ever: namely, the “MP Expenses” project launched by The Guardian in 2009, which asked volunteers to comb through more than 200,000 expense reports filed by federal representatives, looking for evidence of fraud or otherwise questionable behavior. In just a few weeks, more than 20,000 volunteers had gone through documents and found information, with an almost unheard-of participation rate of over 50 percent in the initial stages (many crowdsourced projects get 10 percent or less of the potential user base to participate).

Make it easy, and gamify it if possible

As Simon Willison suggested in a post-mortem on the effort, there were a number of features that made it much more likely that the MP Expenses project would succeed, and one crucial one was the nature of the investigation — not only was the goal public-spirited (i.e., political corruption) but it also appealed to every constituent who may have suspected that their representative was doing something shady. In a similar vein, the ProPublica project has a clear social goal, and the benefit is obvious. Not only that, but “Free The Files” also fits one of the other guidelines for successful crowdsourcing: it is very simple to do.

The Guardian has found with some of its other crowdsourcing or “open journalism” experiments — such as the opening up of story schedules for public input — that if a project is too open-ended or vague, people are not as likely to participate. ProPublica has also found the same thing with some of its previous crowdsourcing experiments. While people may be willing to help, they are more likely to do so if the task is brief and obvious. As Zamora described it to the Nieman Journalism Lab:

“The beauty of this is its simplicity. We’re asking for very specific data points… We want to give people incentive but also don’t want them to feel, ‘This is a Sisyphean task, we’ll never make it.'”

In the case of “Free The Files,” taking down a name and number is all that’s required. The only thing that would make it easier is if Google added the PDF screen captures to its “ReCaptcha” anti-spam login feature (which Google cleverly uses to help it figure out hard-to-decipher words that come up via its book-scanning project). There is even an aspect of gamification to the ProPublica effort — another key lesson from the Guardian effort — with a leaderboard that tracks the most prolific contributors: the current winner, KB, has “freed” information from a staggering 4,360 files and the number two has done over 1,500.

Although every large-scale journalistic effort may not fit with a crowdsourcing approach, something like “Free The Files” literally could not have been accomplished before now without thousands of hours of work by journalists, which is a luxury that few traditional — or even digital-first — media entities can afford. It’s encouraging to see a crowdsourced strategy working in practice, and hopefully more media outlets will take a cue from ProPublica and experiment with it.

Post and thumbnail images courtesy of Flickr user Christian Scholz