Hadoop vendor Hortonworks, along with customers Target, Merck and Aetna, and software vendor SAS, has started a new group designed to ensure that data stored inside Hadoop systems is only used how it’s supposed to be used and seen by whom it’s supposed to be seen. The effort, called the Data Governance Initiative, will function as an open source project and will address the concerns of enterprises that want to store more data in Hadoop but fear the system won’t match industry regulations or stand up to audits.
The group is similar in spirit to the Open Compute Foundation, which launched in 2011. Facebook spearheaded the Open Compute Project effort and drove a lot of early innovation, but has seen lots of contributions and involvement from technology companies such as Microsoft and end-user companies such as Goldman Sachs. Tim Hall, Hortonworks’ vice president of project management, said Target, Merck and Aetna will be active contributors to the new Hadoop organization — sharing their business and technical expertise in the markets in which they operate, as well as developing and deploying code.
Among the rationale for creating the Data Governance Initiative were questions about the sustainability of the Hortonworks open source business model, some which were brought to light with the revenue numbers it published as part of its initial public offering process, Hall acknowledged. The idea is that this group will demonstrate Hortonworks’ commitment to enterprise concerns and work with large companies to solve them. It will also show how Hortonworks can drive Hadoop innovation without abandoning its open source model.
“We want to make sure folks understand it’s not just these software companies we can work with,” Hall said, referencing the initial phases of Hadoop development led by companies such as Yahoo and Facebook.
Hortonworks plans to publish more information about the Data Governance Initiative’s technical roadmap and early work in February, but the Apache Falcon and Apache Ranger projects that Hortonworks backs will be key components, and there will be an emphasis on maintaining policies as data moves between Hadoop and other data systems. Code will be contributed back to the Apache Software Foundation.
Hall said any companies are welcome to join — including Hadoop rivals such as MapR and Cloudera, which has its own pet projects around Hadoop security — but, he noted, “It’s up to the other vendors to recognize the value that’s being created here.”
“There’s no reason why Cloudera couldn’t wire up their [Apache] Sentry project to this,” Hall added. “. . . We’d be happy to have them participate in this once it goes into incubator status.”
Of course, Hadoop competition being what it is, he might well suspect that won’t happen anytime soon. Cloudera actually published a well-timed blog post on Wednesday morning touting the security features of its Hadoop distribution.
You can hear all about the Hadoop space at our Structure Data conference in March, where Hortonworks CEO Rob Bearden, Cloudera CEO Tom Reilly and MapR CEO John Schroeder will each share their visions of where the technology is headed.
Update: This post was updated at 12:20 to correct the name of the organization. It is the Data Governance Initiative, not the Data Governance Institute.