Social Networking & Dawn of the Zettabyte Era

[qi:gigaom_icon_cloud-computing] Earlier today, I stopped by at the Social Graph Symposium at Sun Microsystems’ (s JAVA) Menlo Park campus. The event, which attracted some of the most well-known experts on social networks and social graphs, was organized to look at the various challenges and opportunities being presented by the increased socialization of the web.

And there is no opportunity bigger than the one offered by the computational needs of this new social web. As we discussed during our first Structure conference (and we will continue to discuss at our upcoming Structure 09 conference on June 25th ), the social Internet has made it easy for anyone to create, publish, distribute and consume content. Such content can range from blog posts to YouTube videos to Flickr photos to simple, 140-character tweets.

But small drops of water will lead a bathtub to overflow if the pipes are clogged, and this is the challenge faced by the underpinnings of the web. “In the next 12 months there will be a zettabyte of information on the Internet,” said Dr. James Baty, distinguished engineer, VP and chief technology officer of Sun Microsystems. A zettabyte is the equivalent of 1 billion terabytes — or nearly a billion times the data stored on the various drives in my apartment.

That explains why Sun was interested in hosting the event — after all, online social interactions are key drivers of the massive explosion of data on the Internet. To help you better understand the magnitude of growth, let me share a couple of data points from a previous post about Facebook’s photo service.

  • Facebook users have uploaded more than 15 billion photos to date, making it the biggest photo-sharing site on the web.
  • For each uploaded photo, Facebook generates and stores four images of different sizes, which translates into a total of 60 billion images and 1.5 petabytes of storage.
  • Facebook adds 220 million new photos per week or roughly 25 terabytes of additional storage.

Facebook is trying to use a smart-software approach to manage this data deluge. The story is no different at, say, MySpace, Twitter or any big social web company. More and more companies are turning to Hadoop and other software written for the ultra web. Gary Orenstein in a recent post outlined the various systems that have emerged to capitalize on the data mining renaissance.

Sun wants to understand the computational needs of a web that is driven by real-time social interactions. For the longest time, the world has been OK with batch processing of data that took hours. Not any more — for the web (and the Internet) are becoming real-time propositions. To analyze the data would mean a lot of computing horsepower.

“The computational challenge of looking at the unstructured data and mining that data is immense,” Dr. Baty said. We are at a tipping point, he said, that will see compute clusters of today bulk up to levels that would put even steroid-enhanced baseball players to shame. “We are going to go from tens of thousands of (processor) cores in cluster to hundreds of thousands of (processor) cores,” he said.