Some financial analysts have questioned Teradata’s continued growth in an IT market where customers have Hadoop on the mind, but if the decades-old data warehouse vendor is scared, it isn’t letting on.
“It’s more about mitigating the hype than mitigating the reality,” Teradata Labs President Scott Gnau said when I asked him recently about the effects of open source Hadoop on Teradata’s historically proprietary (and pricey) software and appliances. Because of issues around governance, security and performance, he said it’s “kind of silly” to suggest Hadoop is a valid replacement for a Teradata system.
If anything, Gnau thinks Hadoop and Teradata technologies are complementary. “Our joint components are worth more than the individual parts alone,” he said.
At least for now, it seems like a fair point. On Monday, Teradata announced a handful of new products and features, including a capability called QueryGrid that lets user write a SQL query in Teradata and have it analyzed automatically in a Hadoop or a Teradata Aster system. It also announced support for the semi-structured JSON file format, meaning Teradata users will be able to extend their deployments across data from sensors, GPS trackers, web logs and other popular sources of machine-generated data.
This means that data types that historically might have been reserved for storage and analysis in Hadoop (and, for JSON especially, possibly a NoSQL database such as MongoDB) now might only need to be stored there. The way Teradata sees it, the more data that companies store — even in Hadoop — the higher the likelihood some of it will become important. The more important it becomes, the more they’ll want to analyze it using a proven, mission-critical system such as Teradata.
“I can put a screw in the wall by using a hammer,” Gnau said, referencing attempts to build SQL query engines for Hadoop, “but it’s not the most elegant solution.”
Actually, most Hadoop vendors might agree with that sentiment right now. Hadoop startup Hortonworks, which has a tight partnership with Teradata, makes no bones about its strategy of acting as a data platform in which users can store and process their big data, but that is designed to work very well with more-sophisticated analytic software such as Teradata.
“We don’t force the customer to make a hard decision of ‘I’m going to unplug one and move to another and hope it all works,’” Hortonworks CEO Rob Bearden told me recently.
Another Hadoop vendor, MapR, is working on its own open source SQL-on-Hadoop tool, but has also baked support for the HP Vertica analytic database into its software. Even Cloudera, the Hadoop vendor most vocal about positioning itself as a disruptive force to the enterprise data warehouse vendors such as Teradata, still relies heavily on software partners, including those selling data warehouses and doesn’t discount their value.
Existence doesn’t mean growth
However, none of this is to say that Teradata or other companies of its ilk (actually, most of its previous independent competitors have been acquired by companies such as IBM, HP and EMC) will remain unscathed from a revenue perspective — even if its largest customers are still scaling up like crazy. The advent of Hadoop does mean that customers can house more data in an inexpensive locale rather than paying high per-terabyte rates. The move toward SQL on Hadoop does mean that customers can move some analytic workloads onto less-expensive systems.
Intel expects Hadoop will eventually be the most popular workload running atop its server processors. If that turns out to be true, it will represent a level of investment in Hadoop across companies of all types that might lead CIOs to demand even more from it and the companies selling it. Already, EMC-VMware spinoff Pivotal, in particular, is making a big bet that customers will want to buy their Hadoop, their databases and their data warehouses from a single source, and it’s tinkering with its pricing to make its vision a reality.
The uptick in interest for Spark, a next-generation in-memory processing framework that can run atop the Hadoop file system, only makes Hadoop look better. Spark has a subproject called Shark that’s designed for interactive, in-memory SQL queries. Even business intelligence startups such as ClearStory Data and Platfora — two companies that would certainly like to take some market share from Teradata — are either built on Spark or working on supporting it.
Away from Hadoop, there are database startups such as MemSQL and Citus Data that are building fast, distributed and relatively inexpensive analytic databases that speak SQL but can analyze data in a variety of formats and across different data stores.
Teradata might be the world’s biggest and best pure-play analytics vendor, but nothing lasts forever. Just look at IBM: It still sells and services mainframes, but it’s not betting the future on them. It doesn’t even sell servers anymore. As the open source community (and the startups built on its software) continues to encroach on Teradata’s territory, it probably has to do something really big to reestablish its dominance, or get used to a future that looks a lot like IBM’s mainframe business.
Yes, it will remain indispensable to some very large companies and very important workloads, but everything else might find a different home.