10 ways big data changes everything

How Twitter data-tracked cholera in Haiti

By Mathew Ingram
Sifting through the massive amounts of information that flow through the Twitter network is no easy task, since more than 250 million tweets are posted every day, according to a recent estimate from the company. But within that stream are some valuable pieces of information — data that could be used to track the spread of disease, for example, and more accurately identify its victims. A recent study by medical researchers at Harvard showed that Twitter was substantially faster at tracking the spread of cholera in Haiti following the earthquake in 2010 than any traditional diagnostic methods.
In fact, the study (PDF), which was authored by Dr. Rumi Chunara, a research fellow at Harvard Medical School who also works with the online data-oriented HealthMap project, showed that by using information from Twitter, researchers were able to pinpoint outbreaks of the deadly disease more than two weeks before they were identified by traditional methods. The study was released in January, on the second anniversary of the Haitian earthquake.
In an interview with GigaOM, Chunara said she and the other researchers got the idea for the study after noticing a lot of cholera-related messages flowing through Twitter in the aftermath of the quake. These tweets were highlighted by the social media–tracking service Ushahidi, a platform used by aid agencies, government workers and others during events like the Haiti quake as a way to track victims and other incidents. Other studies have shown that Ushahidi and similar tools can be a much more effective means of communication in such circumstances than official channels.
“We noticed that the volume of tweets over time correlated quite strongly with the official reports of cholera in Haiti, so we decided to take a closer look,” Chunara said. The researchers collected and scanned 4,697 reports via the HealthMap service, along with almost 200,000 individual tweets, which she said was “quite a large data set compared with similar studies of this kind.” The point of the research was to show that information from real-time sources like Twitter and other social tools could be an effective supplement to official methods of locating and diagnosing outbreaks like cholera.
“Official case reports have to get verified by hospitals, so it often takes a couple of weeks for that information to be posted and available to health workers,” said Chunara. “Informal sources like Twitter are obviously much more real-time.” The study concluded that by analyzing Twitter data, researchers would not only have been able to pinpoint the location of cholera cases but could also have determined the reproductive rate of the outbreak more quickly, something that can be a crucial element for health workers in stemming the spread of an infectious disease.
Armed with data from the study, Chunara said health workers and nongovernmental agencies would theoretically be able to apply these principles in future disasters and outbreaks and get a faster reading of where to focus their efforts. “The use of social media is growing so quickly, and provides so much interesting data, that it behooves us to figure out how to use these kinds of tools for research,” she said.
What the Harvard and HealthMap study shows is that analyzing the data from large sets like the tweets around Haiti isn’t just good at tracking patterns or seeing connections after an event has occurred, but can actually be of use to researchers on the ground while those events are underway. Other projects aimed at Twitter data have tried to isolate stock-related activity or predict purchasing behavior, but the HealthMap research is one of the first to show that mining social networks could have a real — and real-time — impact on health and social welfare.