Predicting Twitter popularity is all about probability

Tweets have the power to decimate markets, but they also have users and companies seeing dollar signs. With huge marketing, political, and social mobilization potential, how can you predict which tweets will get more views, and which retweets will go viral? A new study developed a statistical model that attempts to estimate the popularity of tweets, and thus how memes spread.

Starting with 52 “root” tweets from users both famous and obscure, the researchers first analyzed the dynamics of retweeting, like the speed and spread of a tweet from a user to followers and then their followers. The researchers, from the University of Washington, MIT, and Penn, used the Twitter API to collect all the retweet information and found that most retweets occurred within one hour of the original tweet. Not surprisingly, they also found that root tweets are retweeted more than the retweets themselves.

They then plugged the important variables –- number of followers, retweet speed, retweets of other tweets –- into a Bayesian model, a statistical approach that uses prior evidence (the root tweets) to calculate how the retweet graph evolves. They experimented with feeding the model different amounts of prior evidence to see how much was needed to make an accurate prediction. Using only 10 percent of the retweets to guide the model, they were able to reasonably accurately predict retweet time and volume, and the error decreased the more retweet data they included. The average retweet time was only 4.4 minutes.


Throwing more information into the prediction engine (like whether a particular follower has a large numbers of followers of his or her own) could improve the accuracy. Their model was thrown off, it seems, by a few anomalous tweets with a very rapid onset and termination of retweets that didn’t follow the same pattern as the other tweets. (Though they don’t identify who sent those tweets, my bet is on @KimKardashian, whose followers’ actual and predicted retweet timecourse is pictured above.) The researchers didn’t even consider the time of day a tweet was posted, nor its content; there is likely huge potential to mine in those domains for what, and when, leads to trending.

With the abundance of the Twitterverse open to developers via API, this study represents just the tip of the iceberg in predicting tweeting behavior, something that startups like Blab are busily pursuing. It also shows that robust methods like Bayesian statistics can predict if a tweet has any retweet life left, and thus whether it can gather more eyeballs and clicks, something that is sure to prove very lucrative.