Hortonworks is pitching an object store for Hadoop

Hadoop vendor and Yahoo-spinoff Hortonworks published a blog post on Tuesday laying out the case for building a Hadoop object store called Ozone. The addition of such an option to the Hadoop ecosystem would make the platform more compelling as a webscale or even cloud data store.

This is especially important, post authors Jitendra Pandey and Sanjay Radia argue, because the YARN resource-management framework is opening up Hadoop to more consideration as a platform capable of hosting all sorts of big data applications beyond just MapReduce, Spark or other traditional data-processing jobs. As evidence that multi-tenant Hadoop environments are coming, they point to an ongoing effort by [company]Hortonworks[/company] (and Altiscale, I’ll add) to integrate YARN with both Docker and Google’s Kubernetes tool for managing Docker environments.

A high-level Ozone architecture. Source: Hortonworks

A high-level Ozone architecture. Source: Hortonworks

The authors give the following list of goals for Ozone:

  • Scale to trillions of data objects.

  • Support wide range of object sizes, and optimize for a few kilobytes to tens of megabytes.

  • Guarantee consistency, reliability and availability similar to HDFS.

  • Build on HDFS’s block layer.

  • Provide a REST based API to access and manipulate the data.

  • Support cross data center replication for higher durability and availability.

The post goes into more details about how Ozone would fit into the Hadoop Distributed File System. Ozone is not yet an official Apache Hadoop project, but is just a proposal. If built, it will join other alternative file stores such as HBase and Accumulo.

Depending on the environment in which it’s running, Hadoop is already compatible with multiple third-party object stores, including Amazon S3 in the cloud and OpenStack Swift in the data center. A purpose-built Hadoop one could be beneficial, though. If developers are going to use Hadoop as the storage layer for the next-generation stuff they’re building, they’ll need something that delivers what the cloud ones can but it built with Hadoop in mind.