Finding Patterns in Social Data a Big Problem — the Cloud Can Help

The race to find relevance in the reams of social data that flows past us every day is never-ending. To take just two examples, Facebook is busy trying to filter the “likes” of half a billion users and turn the results into a usable search engine, while Twitter (along with a number of third-party services) is attempting to figure out who follows who so it can make recommendations to them. The biggest problem for both is that analyzing that much unstructured data is extremely difficult. Now researchers say they have something that might help: software that can find complex patterns in billions of bits of data in a matter of seconds — using cloud computing.

The researchers, two from the University of Maryland and one from the University of Calabria in Italy, reported their results in a paper entitled “COSI: Cloud Oriented Subgraph Identification in Massive Social Networks,” which will be delivered at the Advances in Social Network Analysis and Mining conference to be held in Denmark in August. In the paper, they describe how the explosion of data from social networks has caused problems for services that want to find patterns in it:

A technical obstacle to all of these is the difficulty inherent in being able to find all parts of the social network that match a given query network pattern. This essential first step (called the “subgraph matching” step by computer scientists) is…enormously challenging and has long been known to be computationally very difficult, rising exponentially in complexity with the size of the network.

With that in mind, the three researchers developed an algorithm that could take such a problem and split it up into pieces, parcel out those pieces to a cloud computing platform such as Amazon’s (S amzn) EC2, search for patterns and then pull the data back together. According to the paper, the team managed to perform “subgraph pattern-matching queries” on real-world social network data with more than 750 million “edges,” or connections between individuals, in less than a second. More recent results have shown this is possible with databases that have more than a billion edges.

If you’ve ever looked at the Twitter follower graph of a high-profile user — one with hundreds of thousands or even millions of followers — try to imagine the complexity of that network when you move down a level and look at all the users who follow each one of that super-user’s followers, and then move another level down and look at all those followers, and so on. It’s easy to see how the number of relationships can become incredibly large. But the COSI research proves it’s possible to get meaningful data out of that giant pool of information, all thanks to the cloud.

Related content from GigaOM Pro (sub req’d): Big Data Marketplaces Put a Price on Finding Patterns

Post and thumbnail photos courtesy of Flickr user Argonne National Laboratory