Open journalism also means opening up your data, so others can use and improve it

The term “open journalism,” which has become a core principle of digital media for forward-thinking outlets such as The Guardian newspaper in Britain, is often used to mean journalism that engages with its audience (or “the people formerly known as the audience” as Jay Rosen calls them) and allows them to contribute to the process. But the idea of opening up journalism is also about what media companies can do beyond just producing articles or facts for people to consume — and a big part of that is opening up the data behind their stories.

There are a couple of good examples of this at the Source blog, which is part of the Knight-Mozilla OpenNews project devoted to opening up journalism data. In the first, the New York Times posted all of the data associated with a recent story about the militarization of U.S. police forces — an offshoot of the news story about the shooting of 18-year-old Michael Brown in Ferguson, Mo. — to Github.

Open-source code, open-source journalism

Github is a kind of open community combined with a back-end hosting service that allows developers to post the source code for software they are working on, or want to give others access to. It makes it easy for multiple people to take the same code and do a number of different things with it — including check it for errors, improve it and use it to build new things. And for those interested in open journalism, it provides the same kinds of access for data that emerges from reporting projects like the New York Times story.

NYT data map

As the Source post notes, the Times got the data for its map and associated story from the Pentagon, based on a request for information about all the transfers of equipment that the U.S. military had made to domestic police forces since 2006, and it posted the entire data repository to Github for others to use. A member of the NYT editorial team said that after publishing the map, the newspaper had gotten a number of requests from other media outlets looking for access to the numbers behind it, and Github seemed like the easiest way to provide it — and a number of newspapers have already used the data for their own stories on the phenomenon.

Team member Tom Giratikanon noted that another major benefit of opening data up in this way is that it can be checked for errors (which someone started doing not long after it was posted), but that the flip-side of that is errors can also be introduced accidentally:

“GitHub makes it easy for others to contribute to your data and code, which is powerful. A few people took the time to consolidate the spreadsheets and turn it into a CSV. One person began factchecking the data to see if the outliers made sense. But including their requests requires verification. For instance, one of the people who made a CSV version initially left off 20,000 rows by accident, which would have been easy to miss.”

Readers can check the data themselves

The Knight-Mozilla blog also describes how BuzzFeed took the data behind a story it did on racial segregation in Missouri and posted it to Github, along with the code that the outlet used to generate the numbers from census data and other publicly-available information. And Jeremy Singer-Vine of BuzzFeed told the Source blog why he thinks being as transparent as possible about such projects is important because it allows for a number of related benefits, including:

Verifiability: Readers or other media outlets should be able to check sources and code or math for any obvious mistakes.

Reproducibility: Readers and/or news entities should be able to conduct the same analysis and see if they get the same results.

Reusability: Readers or media should be able to run the same analysis on updated or different data or run a new analysis on the same data.

As with the code behind software programs — the original use for things like Github — there are a host of benefits to opening up the data that provides the foundation for news stories, including the fact that more eyeballs on the data means a greater likelihood of finding errors and/or misinterpretations of that data. And that’s good not only for the outlet that produced the original story but for journalism as a whole.

Post and thumbnail images courtesy of Shutterstock / Carlos Castilla