University of Pennsylvania researchers have found that the words people use on Twitter can help predict the rate of heart disease deaths in the counties where they live. Places where people tweet happier language about happier topics show lower rates of heart disease death when compared with Centers for Disease Control statistics, while places with angry language about negative topics show higher rates.
The findings of this study, which was published in the journal Psychological Science, cut across fields such as medicine, psychology, public health and possibly even civil planning. It’s yet another affirmation that Twitter, despite any inherent demographic biases, is a good source of relatively unfiltered data about people’s thoughts and feelings, well beyond the scale and depth of traditional polls or surveys. In this case, the researchers used approximately 148 million geo-tagged tweets from 2009 and 2010 from more than 1,300 counties that contain 88 percent of the U.S. population.
(How to take full advantage of this glut of data, especially for business and governments, is something we’ll cover at our Structure Data conference with Twitter’s Seth McGuire and Dataminr’s Ted Bailey.)
What’s more, at the county level, the Penn study’s findings about language sentiment turn out to be more predictive of heart disease than any other individual factor — including income, smoking and hypertension. A predictive model combining language with those other factors was the most accurate of all.
That’s a result similar to recent research comparing Google Flu Trends with CDC data. Although it’s worth noting that Flu Trends is an ongoing project that has already been collecting data for years, and that the search queries it’s collecting are much more directly related to influenza than the Penn study’s tweets are to heart disease.
That’s likely why the Penn researchers suspect their findings will be more relevant to community-scale policies or interventions than anything at an individual level, despite previous research that shows a link between emotional well-being and heart disease in individuals. Penn professor Lyle Ungar is quoted in a press release announcing the study’s publication:
“We believe that we are picking up more long-term characteristics of communities. The language may represent the ‘drying out of the wood’ rather than the ‘spark’ that immediately leads to mortality. We can’t predict the number of heart attacks a county will have in a given timeframe, but the language may reveal places to intervene.”
The researchers’ work is part of the university’s Well-Being Project, which has also used Facebook users’ language to build personality profiles.