How Google makes the most of flash storage in its data centers

While flash can offer seriously fast storage performance in comparison with hard disks, it’s still more expensive per byte than the tried and true disk, which is why it’s not yet feasible to store everything on flash. Even a company as wealthy as Google (s goog) knows this, and perhaps that’s why its engineers have come up with a system for intelligently determining when and how to use flash in order to optimize the use.
After implementing the system at Google scale for distributed cloud-storage purposes such as Gmail storage, MapReduce jobs and video processing, the engineers reached the conclusion that one can think reasonably think of flash as “a cost-effective complement to disks in data centers,” according to a paper presented at the USENIX conference in San Jose earlier this week.
The flash allocation recommendation system, which goes by the name Janus, operates by enforcing policies on when data needs to get kicked out of flash and shunted to disk. In determining whether a workload is worthy of getting written to flash, the system considers how old the data is. That’s because most I/O activity is done on newly created files. Generally speaking, new workloads get special treatment on flash before getting ushered to disk.
In deciding how long data can stay in flash before getting bumped to disk, the system also thinks about how long it’s been between read requests. The data that gets asked for least often probably should be sent on its way down to longer-term but distant and slower disk.
Altogether, the system enables seriously efficient usage of flash resources:

Our results show that the recommendations allow 28% of read operations to be served from flash by placing 1% of our data on flash.

Google isn’t the only company thinking about how to make the most of storage resources. Facebook (s fb) has been spending lots of time thinking up both hardware and software for storing away data that doesn’t need to be accessed very quickly — think of those old photos sitting at the very back of your Facebook photo album, or perhaps analytics data the company needs to keep around to be compliant with regulatory requirements.
Flash could be a key part of the playbook for Facebook — which could help get the old picture out to your screen faster — but even Blu-ray might be on the table to store certain kinds of data as well. Facebook has also developed the McDipper key-value cache server to run in flash storage, so data gets served at high speeds but doesn’t rely on more expensive DRAM.
The common point here is that inside the biggest of data centers — and conceivably smaller ones too — not all data can be handled equally. To do that would be to mismanage valuable hardware assets and create bottlenecks for the performance of applications that need as-soon-as-possible delivery.