Facebook’s answer to serving 700TB of graph search data is lots of SSDs

Facebook’s(s fb) graph search feature requires finding and serving the right data fast, and from a database that currently houses more than a trillion posts and 700 terabytes of data overall. In a Thursday morning blog post, engineer Ashoat Tevosyan dove into some of the challenges of building infrastructure that can handle these demands.

One decision stood out to me was that Facebook opted for solid-state drives to store most of the data, saving only the most-frequently accessed stuff for RAM. This wasn’t a problem until Graph Search began including users’ posts, which drastically bumped up the size of the indexes it was dealing with. According to the post

“[S]toring more than 700 terabytes in RAM imposes a large amount of overhead, as it involves maintaining an index that is spread across many racks of machines. The performance cost of having these machines coordinate with each other drove the Unicorn [search infrastructure] team to look into new solutions for serving the posts index.”

Facebook's new Dragonstone server.

Facebook’s Dragonstone server.

SSDs have been a key part of Facebook’s growth strategy for a while as an option for preserving the performance users require but saving on the high costs of storing data in RAM. In January, it unveiled a new all-flash server call Dragonstone for just such a purpose. In March, the company detailed a system it had built called McDipper, which is an SSD-based implementation of the popular memcached caching layer for RAM.

However, just because Facebook is using flash and SSDs a lot more often, that doesn’t mean the company is always happy about. As VP of Engineering Jay Parikh told me during the company’s Open Compute Summit earlier this year, if hard disks are like minivans and current flash drives are like Ferraris, Facebook is looking for the Toyota Prius of storage that delivers the right balance of speed, efficiency and cost.

Check out the rest of Tevosyan’s post for more details on building the Graph Search indexes in HBase, harvesting user data from its MySQL cluster without throttling its performance and using Facebook’s new Wormhole technology to update the Graph Search index as changes happen to the MySQL data.