Twitter details its Manhattan real-time database

Twitter’s service is nothing if not fast-moving, and on Tuesday night the company published a blog post detailing the database that helps it keep up. Called Manhattan, it’s a distributed, real-time database built to serve multiple teams and applications within the company.
It’s also something of an indictment against existing open source database technologies, at least when it comes to handling the scale and, probably more accurately, the speed of Twitter. The blog post was authored by Twitter software engineer Peter Schuller, who wrote:

“We were spending far too much time firefighting production systems to meet the performance expectations of our various products, and standing up new storage capacity for a use case involved too much manual work and process. Our experience developing and operating production storage at Twitter’s scale made it clear that the situation was simply not sustainable.”

Schuller goes into a fair amount of detail into how Twitter built Manhattan to be reliable, consistent and easy to use, and also details some of the data formats it’s designed to handle. For now, users interact with Manhattan as a key-value store, but Twitter is looking to add other interfaces, including a graph-based capability. It consists of three storage engines that are designed for read-only Hadoop data, write-heavy and read-heavy data, respectively. It has numerous services built in, including for importing Hadoop data, ensuring strong consistency and counting time-series data.
Perhaps most importantly for developers and engineers, Manhattan is a storage service meant to be consumed just like any other cloud storage service. “Engineers can provision what their application needs (storage size, queries per second, etc) and start using storage in seconds without having to wait for hardware to be installed or for schemas to be set up,” Schuller wrote. Twitter took great care to ensure its multitenant status (i.e., it’s serving many teams and application simultaneously) didn’t result in subpar performance because one user is hogging too many resources.
Twitter plans to release a technical paper at some point detailing even more about how Manhattan is built. Given the company’s penchant for open source, it wouldn’t be surprising if it open sourced Manhattan at some point as well. The company released its MySQL code in 2012, and recently contributed code to Facebook’s WebScaleSQL open source project.
The mere presence of Manhattan speaks to the incredible and often unique needs of large web companies, but it’s fair to wonder for how long their present technologies will remain on the edge. For a growing number of applications, companies like Twitter, Google, Facebook and LinkedIn seem to have moved on from the first batch of NoSQL technologies — which are now working their ways into large enterprises — and are now building new systems just like they built Cassandra, Voldemort and BigTable in the past. Maybe Manhattan will be tomorrow’s Cassandra, and LinkedIn’s Espresso the new MongoDB, for the next wave of startup developers looking to do something new.