Cloud computing is going to absorb your big data workloads, too

Go ahead and deploy your Hadoop cluster in the cloud. Really. It can handle it.

Cloud computing providers and big data vendors have been working toward this moment for years, and it looks like the moment has finally come. Of all the news coming out of the Strata and Hadoop World shows taking place this week, the most compelling stuff all goes to prove this point. Here’s a quick recap of what was announced:

How a hybrid Hortonworks architecture might look. Source: Microsoft

How a hybrid Hortonworks architecture might look. Source: Microsoft

That’s just product news, about just Hadoop, from just this week. The broader goings on around the data community paint an even clearer picture of where we’re headed. All around, big, well-funded companies are putting some serious effort into blurring the lines between big data platforms and cloud computing platforms:'s Wave on an iPhone.’s Wave on an iPhone.

  • [company][/company] announced with much fanfare a new analytics service this week that is of course delivered as a cloud service. Many next-generation analytics products are also cloud-based or at least offer a cloud option, including Tableau and [company]ClearStory Data[/company].
  • Companies like Google, Microsoft and IBM continue to release new machine learning features on top of their cloud platforms and inside their cloud applications, making them look like more appealing places to store data, as well.
  • [company]Teradata[/company] is running a Hadoop cloud offering as well as a cloud-based offering of its flagship data warehouse system.
  • [company]Oracle[/company] — Oracle! — announced a platform-as-a-service offering. It even hired Peter Magnusson, one of the key engineers behind Google App Engine, to help lead its development. Laugh if you will, but Oracle seriously suggesting its customers run their databases in the cloud (including “extreme performance” versions) says a lot about how times have changed.

And then there are the numerous startups doing some flavor of Hadoop in the cloud — [company]Mortar[/company], [company]Altiscale[/company] and [company]Qubole[/company], to name a few — and seemingly dozens of analytics startups, some of which are running some very impressive infrastructure under a sleek UI.

AWS and Google regularly release new big data services for their cloud platforms, and we’ll likely see at least one more come out of Amazon’s Re:invent show next month. By the way, every cloud provider now offers solid-state drives, and they’re the default local and persistent storage options on AWS. That’s part of the reason Spark was able to run so fast in that Databricks benchmark test.

Don’t get me wrong, we are by most accounts very far from the point where most (broadly defined) big data workloads, or probably even a significant fraction of them, are running in a cloud service. For every Netflix running large Hadoop and Cassandra clusters in the cloud, there are probably two large banks that are still experimenting with three different Hadoop sofware vendors. Startups such as Interana and Cask (nee Continuuity) that want to target enterprise customers still face the harsh reality that their initial cloud delivery models will have to wait.

But with a few exceptions for particularly large or particularly regulated datasets, the tide is turning. If users weren’t asking for it, it’s hard to see all these companies trying so hard to make it happen.