Facebook detailed on Friday a new networking technology dubbed Data Center Fabric that coordinates all of the information flowing throughout its new data center in Altoona, Iowa. The Altoona facility is the first Facebook data center to showcase the new technology and going forward all new data centers will have it installed, said Alexey Andreyev, a Facebook network engineer.
While it might be common for people to think that sending a query to Facebook is not a data-intensive task, once the information hits a data center that query actually multiplies as it ricochets to and from various Facebook machines. Because of this multiplier effect, Facebook has to make sure it can account for all that inter-data-center traffic and ensure its system doesn’t slow down as it handles billions of user requests.
Data Center Fabric was created to deal with this dilemma and it’s basically the software brains that can intelligently and automatically distribute traffic in the new Facebook data center as well as make repairs when something goes kaput.
It used to be that Facebook would construct its data center networking systems using groupings of many machines, typically built to handle large amounts of compute, that consisted of “hundreds of server cabinets with top of rack (TOR) switches aggregated on a set of large, high-radix cluster switches,” according to the Facebook blog post. These groupings of equipment, known as clusters, were the norm in regard to how organizations built networking systems, but Facebook said there are several limitations to this method, including the devices Facebook had to buy to support the clusters.
There are only a few vendors that actually sell networking gear that can function per Facebook’s requirements, and that proprietary gear requires Facebook engineers to know how to operate each box — something that changes from vendor to vendor. This vendor-specific approach is not ideal for Facebook to be able to scale like it wants.
“My belief is that clusters are in existence because of network limitations,” said Najam Ahmad, Facebook’s director of network engineering. “The equipment dictated that we had to build clusters.”
To help deal with this problem, Facebook took on a core-and-pod approach in which the company cut down the size of each cluster to contain roughly 48 server racks. What used to be big clusters of machines are now referred to as pods that contain fewer devices and are connected to the network core that the Data Center Fabric software powers. Data Center Fabric recognizes all of the networking hardware as virtual clusters, and the amount of physical machines that used to comprise those clusters can now be cut down to smaller-sized physical pods, explained Ahmad.
From the Facebook blog:
[blockquote person=”Facebook” attribution=”Facebook”]There is nothing particularly special about a pod – it’s just like a layer3 micro-cluster. The pod is not defined by any hard physical properties; it is simply a standard “unit of network” on our new fabric. Each pod is served by a set of four devices that we call fabric switches, maintaining the advantages of our current 3+1 four-post architecture for server rack TOR uplinks, and scalable beyond that if needed. Each TOR currently has 4 x 40G uplinks, providing 160G total bandwidth capacity for a rack of 10G-connected servers.[/blockquote]
Each pod is connected to Data Center Fabric, which is basically custom networking software that can recognize all the hardware in a virtual plane. Because Facebook buys non-proprietary networking gear, it can hook up its software to the devices and let the software operate the machines–a clear example of software-defined-networking.
From the Facebook blog:
[blockquote person=”Facebook” attribution=”Facebook”]We were able to build our fabric using standard BGP4 as the only routing protocol. To keep things simple, we used only the minimum necessary protocol features. This enabled us to leverage the performance and scalability of a distributed control plane for convergence, while offering tight and granular routing propagation management and ensuring compatibility with a broad range of existing systems and software. At the same time, we developed a centralized BGP controller that is able to override any routing paths on the fabric by pure software decisions. We call this flexible hybrid approach “distributed control, centralized override,” or “DCCO.”[/blockquote]
Facebook also built homegrown configuration-management software that’s part of Data Center Fabric that can atomically configure a white box per Facebook’s specs without its engineers having to do any tweaking. If Facebook wants to scale out and add a new device to its data center, that software recognizes the new machine and sets it up per the correct specs, explained Andreyev.
And if something goes wrong within a particular box, Facebook engineers don’t have to manually troubleshoot it, said Ahmad. With the new software, they just wipe the box clean and start afresh, similar to the idea of decommissioning a virtual machine if something goes haywire and spinning up a new one.
Facebook said that the new networking system has cut down on the complexity of the previous system and now engineers don’t have to worry so much about where the physical machines have to be placed throughout the data center, because Data Center Fabric presents a virtual layout of all the machines and can group them together in that format.
From the Facebook blog:
[blockquote person=”Facebook” attribution=”Facebook”]The fabric introduced new flexibility to locate compute resources of “virtual clusters” in different areas of the data center – wherever free pods are available. There is no longer a need to keep clusters confined to specific contiguous physical spaces, and the new cluster sizes can be as large as the whole building or as small as a pod. But we didn’t force an immediate need for large operational changes – we just made it possible to take advantage of these new benefits as needed.[/blockquote]
Facebook doesn’t have plans to open source the new software any time soon, said Ahmad, but “as time goes on we will look at [open sourcing] individual pieces,” he said. Facebook will also update its current data centers with the new system, but that process will take some time, he said.
Images courtesy of Facebook