There’s been a lot of talk recently about Twitter trending topics, and how they fail to reflect evolving events such as the Occupy Wall Street movement (although some argue that this is the fault mainly of our inflated expectations, rather than Twitter’s algorithms). But despite those kinds of setbacks, there is an emerging industry aimed at using the tweetstreams of millions of people to help predict the future in some way: disease outbreaks, financial markets, elections and even revolutions. According to new research released today by Topsy Labs — which runs one of the only real-time search engines that has access to Twitter historical data — watching those streams can provide a window into breaking news events. But can it predict what will happen?
The theory behind all of this Twitter-mining is that the network has become such a large-scale, real-time information delivery system (handling more than a quarter of a billion messages every day, according to CEO Dick Costolo at the recent Web 2.0 conference) that it should be possible to analyze those tweets and find patterns that produce some kind of collective intelligence about a topic. It’s the same idea that drives companies to do “data mining” on their customers’ behavior, or compels Google and Facebook to track your browsing activity in the hope that they can generate some aggregate information that will be of value, and predict what you might be interested in.
Predicting markets and the spread of disease
One of the first attempts at doing this with Twitter appeared last year, when a team of researchers published a report that looked at the predictive value of sentiment analysis extracted from Twitter (PDF link) compared to the movements of the Dow Jones Industrial Average. The study said that its system could predict the market index with 87-percent accuracy, and within months a hedge fund called Derwent Capital Markets launched a fund that it said would make stock and fund trades based on a similar kind of analysis of Twitter (so far it seems to be doing pretty well).
Medical researchers have also been trying to use Twitter trends and analysis to predict the outbreak or spread of disease, in much the same way that Google came up with Google Flu Trends, which tracks searches for terms associated with the flu — data that seems to correlate fairly well with actual outbreaks of the flu. Two researchers from Johns Hopkins University recently released a study that looked at more than two billion tweets and analyzed them for medical information, and said that this could be a useful tool for researchers and medical staff.
Could Twitter have predicted revolution in Egypt>
In one of the research reports the company released today, Topsy Labs looked at tweets related to the recent Arab Spring revolutions in Tunisia, Egypt and elsewhere in the Middle East, and tried to correlate the rising and falling trends in hashtags such as #iran, #egypt and #yemen with actual events such as the suicide of Mohammed Bouazizi in Egypt — the 26-year-old food vendor whose death crystallized for many dissidents the problems in their country and the need to take action. Twitter was a key tool for raising awareness of this revolution, and Topsy’s data shows that there was a high correlation between actual events and Twitter-related activity around those topics.
Topsy also looked at what it called the “share of voice” or influence and reach that one specific Twitter user gained over a short period of time: Sohaib Athar, the Pakistani programmer who live-tweeted the U.S. military raid on Osama bin Laden’s compound without even realizing it. According to Topsy’s data, when he first began posting, Athar had very little exposure — he wasn’t being followed or retweeted by many people, and those he was being followed by didn’t have much reach (meaning they weren’t followed by or retweeted by many people either). But that all changed over the next 24 hours:
[A]s his tweets were retweeted and mentioned more than 30,000 times, his exposure grew to a whopping 82.68 million unique tweets within 21 hours. As his tweets became more interestig to the Twittersphere, his exposure and influence grew dramatically. He went from 0 to 20 million in under 10 hours and over 82 million in just under 30 hours.
Topsy’s research certainly shows how quickly a single individual can become hugely influential in a very short space of time, and the correlation of Twitter data with events in Egypt and Tunisia is also interesting. But could someone have predicted that Egypt was going to break into open revolution based on the activity Topsy recorded? Perhaps — which is why the U.S. government’s Intelligence Advanced Research Projects Activity unit or IARPA is looking at using data from social media like Twitter and Facebook as part of its intelligence gathering.
The research that Topsy did is far from conclusive, however. In particular, the company didn’t apply any filters based on language or a Twitter user’s location to its analysis — which means that many of the tweets could have come from outside Egypt and Tunisia — and it didn’t try to use any influence-ranking to determine connections between those who were tweeting about the topic (as researcher Kovas Boguta did to produce this fascinating visualization). But it shows what could be done with that kind of data, and it is likely just the start of an ongoing attempt to understand the giant collective consciousness that is Twitter.