MapR, the Hadoop vendor spearheading the Apache Drill project, has integrated an early version of the technology into its big data platform. The company is calling the current version of Drill, version 0.5, a “developer pre-release” that will nonetheless show off what the new type of SQL query engine can do.
Drill was first announced in August 2012, and the market for technologies that allow users to write SQL queries on data stored in Hadoop — and, most importantly, receive fast results — has evolved a lot since then. [company]Cloudera[/company] has released multiple versions of its Impala technology, [company]Hortonworks[/company] has led an effort to make the old-school Hive framework interactive, and numerous other startups and open-source projects have popped up, including from the fast-growing Spark community.
However, [company]MapR[/company] Chief Marketing Officer Jack Norris explained, Drill was worth the wait because Drill offers “a superset” of the features found in other SQL-on-Hadoop engines. Drill’s primary feature is that it allows users to generate schema on the fly by keeping files in their original formats rather than converting them into tables or pre-specified formats before they’re loaded into the database system. For users that want to pre-process their Drill data into certain formats, though, Drill does support that capability.
“We view it as a lot more ambitious than what anybody else has done with SQL on Hadoop,” MapR product management leader Tomer Shiran said.
Curiously, though, MapR isn’t pushing Drill as the only option available in its platform or even necessarily the best one (although its current pre-release version might have something to do with that). MapR’s Hadoop distribution also includes Hive and Impala, and even supports the HP Vertica analytic database via a tight integration.
It’s part of a broader strategy from MapR to reshape its reputation as the proprietary Hadoop vendor by supporting a lot of technologies and contributing a lot of code. “In the past, that wasn’t always the case,” Shiran acknowledged.
Right now, Norris said, everything that touches the application layer in MapR’s distribution is either open source or uses standard APIs, and the company’s plan is to open source as much as it can going forward. On Tuesday, for example, the company also announced a handful of new resource-management capabilities for its Hadoop platform and is submitting its approach to disk IO allocation and node-specific job scheduling back to Apache. Drill includes contributions from more than 40 other companies and institutions, including Cisco, LinkedIn and the University of Wisconsin.
The open-source model has proven itself to be a great way to improve products through crowdsourcing and attract engineers who adhere to the open-source ethos. And in the hyper-competitive Hadoop space, a strong open-source culture is also both a sword and a shield, good for attacking other companies on their openness and defending those same attacks against you.