Inside Akamai and the scary future of streaming video

Akamai CEO Paul Sagan

Update: When it comes to consumers watching YouTube (s goog) or even streaming TV through a Boxee, the assumption is that aside from some buffering or pauses while the streams catch up, the experience will be fine. But when we add live content and interactive elements to those video streams, it gets complicated. Thanks, to a paper detailing Akamai’s (s akam) content delivery network in minute detail, we can see exactly how complicated it is today, and what sort of havoc interactivity might wreak.

In an academic paper written by researchers at Akamai, Harvard and the University of Massachusetts, readers get an in-depth look at how Akamai’s distributed CDN works, how the web itself works, and what the shift from static to interactive content means for content providers and the network itself. Details such as Akamai’s 61,000 servers located across 70 countries and nearly 1,000 networks pale in comparison to the section on how streaming video is going to get much more challenging in the coming years. Updated: Akamai emailed to say this paper was published last year and since then it now has more than 90,000 servers in more than 1,800 locations in 1,000 networks in more than 70 countries.

The paper offers up a timeline of big web streaming events, beginning with Steve Jobs’ MacWorld keynote in 2001 that drew 35,500 viewers and required 5.3 gigabytes Gbps of capacity. President Obama’s inauguration in 2009 drew 7 million simultaneous streams and required nearly 2 terabytes Tbps of capacity. Akamai noted it hit a peak record in 2010 of delivering 3.45 Tbps of data. But those numbers don’t keep Akamai engineers up at night. The future does. From the paper:

In the near term (two to five years), it is reasonable to expect that throughput requirements for some single video events will reach roughly 50 to 100 Tbps (the equivalent of distributing a TV quality stream to a large prime time audience). This is an order of magnitude larger than the biggest online events today. The functionality of video events has also been increasing to include such features as DVR-like-functionality (where some clients may pause or rewind), interactivity, advertisement insertion, and mobile device support.

At this scale, it is no longer sufficient to simply have enough server and egress bandwidth resources. One must consider the throughput of the entire path from encoders to servers to end users. The bottleneck is no longer likely to be at just the origin data center. It could be at a peering point, or a network‘s backhaul capacity, or an ISP‘s upstream connectivity—or it could be due to the network latency between server and end user, as discussed earlier in Section 3. At video scale, a data center‘s nominal egress capacity has very little to do with its real throughput to end users.

The paper goes on to say that because even an awesome data center can only provider a few hundred gigabytes per second of throughput to end users, it’s almost impossible to create a service with the scale to deliver the hundreds of terabytes needed to support video. And while the paper reads like a highly technical advertisement for Akamai (and why the way it has built out its CDN is superior to other CDNs,) it’s also a pretty detailed look into the complexity of the web.

I know many of us take it for granted that the animated GIFs that once slowed down our GeoCities page load times are now so commonplace we drop them into comment threads, but that’s the beauty of driving ever-faster broadband speeds. Sometimes it’s nice to look behind the curtain and see how our infrastructure is keeping up with the increasingly complicated elements we’re throwing at it.

Hat tip to High Scalability, which featured the paper.