When big data meets journalism

The Knight Foundation, a non-profit entity that is one of the biggest funders of media-related projects in the United States — including the new MIT Center for Civic Media, which we wrote about earlier — announced the winners of its annual $4.7-million News Challenge on Wednesday. There’s s a theme running through most of the winners: namely, data as journalism. Just as tech companies of all kinds are focusing on what we at GigaOM call “Big Data” as a tool for new services, the media industry is (hopefully) starting to understand that data can be useful for its purposes as well.

The Knight Foundation noted in a blog post announcing the 16 winners that data and the use of it for journalism was a big theme among this year’s contestants. When the Knight competition first started five years ago, the idea of a “hacker/journalist” who developed applications and journalistic tools around data was unfamiliar one, but the foundation noted that this is now an established position at some media outlets.

Among the newspapers and media entities that have been at the forefront of this data-journalism wave is the New York Times (s nyt), where Aron Pilhofer and a team of developers and programmers have created a number of groundbreaking news features. Not surprisingly, perhaps, Pilhofer is also involved in one of the winning entries in the Knight News Challenge: DocumentCloud, which allows media outlets and journalists to upload and share — and annotate or collaborate on — a variety of documents, won $320,000 and will use the funds to add the ability for anyone to edit or contribute to documents.

The other data-related projects that got Knight funding include:

  • SwiftRiver. SwiftRiver, which got $250,000 from the news challenge, was developed by the founders of Ushahidi, an information network designed to allow rescue workers and other volunteers to find and share information during a crisis or disaster like the recent earthquake in Japan. SwiftRiver is a series of tools that allow anyone trying to make sense of that information — including journalists — to filter and determine the accuracy of those real-time reports.
  • Overview. Developed by a team of journalists at The Associated Press including Jonathan Stray, this project got $475,000 to develop visualization tools that will help journalists explore large data sets. In one early prototype of what the project hopes to do, Stray created a visualization of all the text in the Iraqi war logs.
  • PANDA. Developed by Brian Boyer of the Chicago Tribune — another prototypical “hacker/journalist” — along with a team of other journalists from Chicago and the The Spokesman-Review in Spokane, Wash., the PANDA project plans to use the $150,000 it won to create easy web-based tools that even journalists at smaller newspapers and media outlets can use to analyze data and organize it.
  • ScraperWiki. Based in England, this project allows users to create their own custom “scrapers” that go out and automatically aggregate data from websites and web-based services, based on whatever parameters the user defines. The $280,000 grant from the Knight challenge will be used to build a “data on demand” feature that will allow journalists to create their own profiles and be alerted when data related to a specific search or topic changes somewhere online.
  • OpenBlock Rural. Developed by the University of North Carolina at Chapel Hill, this project — which received $275,000 — is designed to help rural news outlets aggregate and make sense of local information from government and public records. This approach is very similar to that developed by Chicago-based startup EveryBlock, which was funded by a Knight Foundation grant and later acquired by MSNBC (s cmcsa) (s ge).

In the end, data and the tools to manipulate it are the modern equivalent of the microfiche libraries and envelopes full of newspaper clippings that used to make up the research arm of most media outlets. They are just tools, but as some of the winners of the Knight News Challenge have already shown, these new tools can produce information that might never have been found before through traditional means. We hope some mainstream media players are paying attention, and/or getting ideas of their own.

Post and thumbnail photos courtesy of Flickr user David Reece