How and why you should do data journalism

One of the big areas of focus for technology companies over the past year has been “big data” — in other words, the idea that there can be a lot of value in finding patterns in the massive quantities of user data and other information that a business generates. This has a corollary in journalism too: namely, the growing realization that there is a lot of value in finding patterns in news-related information. This weekend saw the launch of an e-book that could be a useful resource for anyone planning to explore that field: The Data Journalism Handbook.

Released at the 2012 International Journalism Festival in Italy, the handbook is a collection of testimonials, tips and in-depth case studies about data-oriented journalism — and fittingly enough, the information was crowdsourced from dozens of leading practitioners of the craft, from the BBC and the Financial Times to the Chicago Tribune and the New York Times (both of whom have teams of developer/journalists who work on data-related projects). The book is being made available free of charge online under a Creative Commons license, although a printed version is also in the works from O’Reilly Media.

What makes data journalism different from regular journalism?

In an introduction to the book, Birmingham City University journalism professor Paul Bradshaw writes about why data journalism is something media outlets and journalists of all kinds should be interested in — and also gives a few examples of early efforts in the field, such as Adrian Holovaty’s EveryBlock, which in turn grew out of an experiment called Chicago Crime, a dynamically-updated map that showed all the official crime reports in the Chicago area (be sure to see Holovaty’s hilarious response to the question “Is data journalism?”). As Bradshaw puts it:

What makes data journalism different to the rest of journalism? Perhaps it is the new possibilities that open up when you combine the traditional ‘nose for news’ and ability to tell a compelling story, with the sheer scale and range of digital information now available.

As a side note, Holovaty (who was trained as a journalist, and is also the developer of the Django web-application framework for Python) has been talking and writing about data-oriented journalism for years, including a kind of manifesto he wrote in 2006. As he put it then, in order to remain essential sources of information for their communities, newspapers need to “stop the story-centric worldview” and spend more time accumulating information that can then be sorted or filtered or displayed in any number of ways.

The Data Journalism Handbook contains descriptions of dozens of other data-oriented projects that newspapers and other media outlets have put together, including a feature in the Las Vegas Sun that looked at injury and infection rates at all the area’s hospitals and then categorized them in a graphic called “Do No Harm,” as well as a Texas Tribune project that collected and charted the salaries of more than 60,000 government employees. Data journalist Jonathan Stray from The Associated Press produced another of the highlighted examples: a visualization of the keywords used in the more than 390,000 Iraqi war logs and diplomatic cables that were leaked by WikiLeaks:

How data journalism could help save the media

The book also contains some useful profiles of the data-journalism teams and practitioners at various newspapers, including the Chicago Tribune and Die Ziet in Germany, as well as tips on “How to Hire a Hacker” and the benefits of holding public “hackathons.” And it has an explanation of the various methods that news outlets can use in order to acquire useful data, such as “scraping” of websites — as well as a discussion of the usefulness of crowdsourcing when it comes to large data sets, something The Guardian showed can be spectacularly successful in its MP Expenses project, where more than 20,000 people helped read hundreds of thousands of official documents.

But is there a larger point to doing this kind of journalism, apart from just building cool interactive units or fancy presentations? The handbook argues that there is. While no one is yet making substantial amounts of revenue from the data journalism they produce, the book’s authors argue that there is a case to be made that finding patterns amid all the information that is flowing past us every day is one of the few areas where the media can add value. As Mirko Lorenz of Germany’s Deutsche Welle puts it:

Today news stories are flowing in as they happen, from multiple sources, eye-witnesses, blogs and what has happened is filtered through a vast network of social connections, being ranked, commented and more often than not: ignored. This is why data journalism is so important. Gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value.

As the handbook notes, some media outlets such as The Economist already derive a substantial amount of revenue from producing analytical reports about news-related topics, and both Reuters and Bloomberg have shown that data that helps businesses and investors make decisions can be very valuable indeed. Not every data project is going to turn into that kind of bonanza, but the skills that are developed by doing them are bound to be useful — which is part of the reason institutions like Columbia University are investing in research into the best practices for digital journalism of all kinds, as Tow Center for Digital Journalism director Emily Bell announced on Monday.