Hadoop security gets better, and not a moment too soon

As covered yesterday by Gigaom’s Derrick Harris, major Hadoop distribution provider Hortonworks announced this morning its acquisition of XA Secure, a provider of fine-grained security and policy management for Hadoop. This is an important event in the Hadoop world — maybe even a watershed moment. In order to understand this acquisition we need to consider Hadoop’s enterprise ambitions, and the deficits that have prevented their being achieved.

Bound for the enterprise
Hadoop — and Hadoop companies — are working hard to get into the enterprise market. The challenge there is that Hadoop was essentially conceived as a laboratory product rather than a corporate one. Various features in enterprise software that most buyers don’t even consider asking about — including management interfaces and integration with other products in the data center — weren’t part of Hadoop’s design. Which is to say Hadoop has some big gaps to cover just to meet eligibility as an enterprise data-management product.

Another huge enterprise credibility deficit for Hadoop has been security, including role-based access control (RBAC). Hadoop was designed for a specific use case: a small group of data specialists working with it and fully controlling its configuration. Under that use case, fine-grained security was less than critical, and so it simply wasn’t part of Hadoop.

The story so far
As Hadoop companies work overtime to get the open source big data framework into enterprise shape, a few security solutions have surfaced:

  • Zettaset, a company dedicated to fortifying Hadoop for enterprise workloads and deployment, has a product, called Orchestrator, which implements cluster-wide security, including RBAC, for various distributions of Hadoop.
  • Cloudera, meanwhile, has been working on an open-source project called Sentry (now an Apache Software Foundation – ASF – incubator project), which adds RBAC security to Hadoop, though for now only to Hive, and Cloudera’s own SQL-on-Hadoop solution, Impala.
  • Another ASF incubator project, called Knox Gateway, provides “perimeter” security for Hadoop, which is to say it implements cluster level authentication, but does not provide RBAC security services.  Knox is included with Hortonworks’ distribution of Hadoop.

The above set of solutions leaves the market with one commercial offering and two open-source solutions, one of which goes bottom-up, providing SQL database security for Hive and Impala, and the other of which goes top-down, providing rather coarse-grained security. Obviously more is needed, which leads us back to Hortonworks’ announced acquisition.

XA Secure and open-source Hadoop
XA Secure is a Fremont, Calif.-based company that provides a management interface and a framework for Hadoop clusters across various distro components and workloads. The goal is to federate and integrate current and future component-level RBAC schemes, allowing them to be managed and audited from one central interface. XA Secure’s solution also provides a policy framework designed to allow synchronizing the semantics of RBAC security for specific users in one Hadoop component to the equivalent permissions in another. In fact, the system is designed so that even non-Hadoop components (for example, data warehouse platforms) can be integrated into its framework, provided the necessary adapters are written by the external component vendors or developers in their communities.

With Hortonworks’ announcement that it is acquiring XA Secure comes the revelation that the company will open source the XA Secure Hadoop access management product, ostensibly through an ASF project. This will take the open source Hadoop community to a new level of security capabilities — well beyond what Knox and/or Sentry can provide on their own.

Not there yet
The Horton-XA Secure announcement is both important and substantive. While we have yet to see if Hortonworks’ competitors will embrace XA Secure’s technology, Hortonworks is, essentially, adding it to the open-source Hadoop platform, at the very least as an optional component. That’s good for the community, as well as for enterprise customers. Meanwhile, RBAC security mechanisms don’t exist for every Hadoop component yet. These must be added and XA Secure’s technology must be able to integrate with them for the full vision of the product to be realized.

The Hadoop community, as well as commercial data warehouse vendors, must also join the ecosystem if the kind of cross-platform integration that XA Secure’s technology was designed for is to come to fruition.  There was a time, not long ago, when new components added to the Hadoop stack — as long as they added significant value and were of high quality — were almost certain to gain adoption.  That kind of adoption is less assured now.

Standard or optional?
Like it or not, the Hadoop world has fragmented, with the open-source approach leading to more of an “open-core” policy. Critical components like Hadoop itself, as well as YARN, Hive, Pig, HBASE, and others are all there, but each vendor is also including a selection of extended components, and these differ from distro to distro. Some vendors, like Pivotal, even include proprietary components.

Will the XA Secure technology end up being a component unique to the Hortonworks Data Platform distribution of Hadoop or will it be universally adopted by all Hadoop vendors? Frankly, it’s hard to tell.  But something as essential as RBAC security is a core need for Hadoop. A standard implementation of it would benefit the Hadoop world and, for that matter, the tech industry overall.