Hadoop security wars

You may recall that Cloudera and Hortonworks each acquired Hadoop security companies last year. Hortonworks acquired XA Secure and Cloudera acquired Gazzang. I covered the XA Secure acquisition in a post on this blog and Gigaom Research analyst George Anadiotis covered each company’s technology in detail in a Gigaom Research Note. There was quite a lot to this story, and last week the plot thickened.

Previously on security wars…
If you haven’t been following the story, I should point out that Gazzang’s big selling point (literally) was its Hadoop encryption technology. Through its acquisition of Gazzang, Cloudera gained access to its technology, as well as the talent behind its development. That much was clear; less so was exactly what Cloudera would do with it.

Initially, Cloudera integrated the Gazzang technology into its Navigator data governance package, as Navigator Encrypt (for encryption of data being sent across the wire as well as data stored in the file system) and Navigator Key Trustee (for encryption key management). Since Gazzang was a commercial product, it made sense that Cloudera was keeping that company’s technology inside Cloudera Navigator, an “advanced component” of Cloudera Enterprise.

You get encryption at rest, and you get encryption at rest…
But last week, I spotted a Cloudera blog post explaining that Cloudera had just checked in new transparent encryption technology to the main code trunk for the Hadoop Distributed File System (HDFS). In other words, Cloudera has made encryption technology — apparently from some combination of Gazzang’s tech and that of Intel’s open source Project Rhino — open source, and has done so as part of the Hadoop project, suggesting that the technology will become part of every major Hadoop distribution. For more details, see Anadiotis’ own blog post on this matter.

Meanwhile, Hortonworks’ acquisition of XA Secure gave it access to role-based access control (RBAC) technology that works across the Hadoop stack. While XA Secure was also commercial technology, Hortonworks committed to making it open source. And, sure enough, Apache Ranger launched as an incubator project shortly thereafter, and features the XA Secure technology.

Security wars, and peace
Is there a conflict here? Well, yes and no. XA Secure’s technology and that of Gazzang and Project Rhino are actually complimentary. So, in theory, the new HDFS encryption capabilities and those of Apache Ranger should coexist nicely. Meanwhile, though, there’s another Apache project (one that happens to be backed by Cloudera), called Sentry, that is also attempting to add RBAC capabilities to Hadoop.

Ranger works in Hadoop itself, Hive, HBase “and other Apache components.” Sentry currently provides RBAC functionality inside Hive and Impala. But it would seem Sentry’s goal is to extend that across all Hadoop components. Both Ranger and Sentry are Apache Incubator projects, so it’s not clear which one will “win,” but it is clear that there’s some risk in using either project’s technology until one is anointed the victor (on a de facto basis, if not de jure).

Apache math
Lest you find it surprising that the Apache Incubator would take on two projects with so much overlap, keep in mind that the Apache Software Foundation (ASF) has a lot going on. The organization just realized that if one counts all sub-projects, then there are 345 active ASF projects and initiatives right now. Prior to realizing this, ASF was saying that it had “200+” active projects and initiatives. The reason for the discrepancy is that the organization thought that there were only “a few handfuls” of active sub-projects, when in fact there are 110.

Sub-projects are important — bear in mind that Hive and HBase were once sub-projects of Hadoop and YARN still is. So counting sub-projects in the aggregate active projects and initiatives metric is quite reasonable. What may be less so is having incubator projects that compete with each other — especially when their corporate backers are each extremely accomplished at getting their projects into the Apache Incubator, and getting them to graduate to top-level project status.