Maybe this is just news to me, but IBM has a SQL-on-Hadoop product in the works called Big SQL. The company announced the technology preview version in March (well under my radar and, from what I’ve seen, nearly everyone else’s radar), and is offering up a cloud-based demo environment for a select group of early users.
As a refresher, the big difference between SQL on Hadoop and the Hadoop connectors that were popular a couple years ago is that SQL-on-Hadoop products query the data where it resides — in HDFS or HBase — rather than pulling it into a relational database environment to analyze it. We have been talking for months about the emergence of a large SQL-on-Hadoop market, but IBM’s name was conspicuously absent from that discussion. The company has Hadoop software called BigInsights and lots of SQL expertise, so it only made sense that IBM would get into the game at some point.
Details on Big SQL are still pretty sparse save for a few high-level blog posts and an instructional video (embedded below), but it looks to take the standard approach, as Cloudera is doing with Impala, of enabling access through traditional tools via JDBC and ODBC drivers.
Ultimately, I think the advent of big data will enable some new types of querying techniques quite a bit different than the SQL queries we’ve come to know and love over the past couple decades. But SQL is still the language du jour and might never go away, so there’s a lot of value to be had if people can put their SQL skills to work on data stored inside Hadoop or other environments, and if companies can work toward a nirvana where all the data is stored in a single place rather than across database environments.
That IBM got this message and got into the game isn’t surprising at all, but it is important. Lots of large companies buy IBM’s software. If it wants them to follow it into the world of big data and Hadoop, it has to give them the tools they need to use it.