Raymie Stata knows a lot about Hadoop. It was Stata who helped bring Hadoop creator Doug Cutting to Yahoo in 2006, and as during a seven-year stint as chief architect and then CTO at Yahoo, Stata was instrumental in helping position Hadoop as the technology famously “behind every click” at the web portal. Now, Stata is trying his hand at the Hadoop startup game, launching a new startup called Altiscale that recently closed a $12 million Series A round from Sequoia Capital and General Catalyst Partners, as well as Accel Partners, Jerry Yang’s AME Ventures and a few individual investors.
Altiscale is in some ways a manifestation of Stata’s seven years of experience helping turn Hadoop from a cute little project into a production system running across 42,000 nodes. It might not be not pretty, but it gets the job done. And, thanks to the handful of former senior Yahoo, Google and LinkedIn engineers that joined Stata (who’s the company’s CEO) at Altiscale, the company knows Hadoop cold.
The deep knowledge of Hadoop shows itself in the product design and business model. The company is “all Hadoop, all the time,” he explained, and everything — including the hardware and the network — is optimized for particular aspects of Hadoop workloads and operations. Essentially, Stata told me, Altiscale wants to be companies’ Hadoop dial tone — when users need to run a job, the service should just be there ready to do it.
So, although Altiscale is a hosted service, it’s not exactly a cloud service as many people would define it. Rather than charge by the hour, for example, Stata’s experience suggests Hadoop services are best charged based on a monthly baseline usage with room even built in for reasonable overages. This is because companies familiar with Hadoop usually understand their baseline requirements, give or take a handful of additional jobs, and would prefer to be able to budget for that each month.
He compares traditional hourly cloud billing to cell-phone billing in the 1990s: “At the end of the month,” he joked, “you were typically surprised on the wrong side.” Altiscale is more like a wireless plan with a maximum amount of minutes per month and some rollover minutes included. In fact, Stata said, “We’re pretty forgiving in terms of the limits. … As long as you’re not abusive, you don’t get charged more for it.”
And unlike many other Hadoop services, Altiscale isn’t immediately going after developers who want to try their hand at big data or deal with data through a wizbang interface. Rather, its initial audience is current Hadoop users — companies and data scientists — who know how the technology works but just want a better way to consume it. Right now, users access Altiscale by SSHing into a “desktop” environment (that’s actually hosted on Amazon Web Services) that gives them access to their favorite Hadoop tools such as MapReduce, Hive, Pig and Flume, as well as to data science tools such as R.
“We call that the scaling down problem,” Stata said.
What that means is that it takes a lot of effort to build a true self-service model that greenhorn Hadoop users can dive right into, and Altiscale would be irrelevant if waited to launch until it had figured that out. Part of that is a design problem, and part of that is a matter of Hadoop being designed to run better at scale. Plus, Stata added, the folks who got to first or second gear with Hadoop and then got stuck are way underserved right now.
However, although Altiscale might be about serving experienced Hadoop users with a more-managed experience, it’s not about serving legacy workloads. A lot of companies are using Hadoop today to somehow perform traditional enterprise data warehouse tasks or tie tightly into existing IT environments, he explained, but “we go after what I call ‘new data problems.'” That means online advertising and any workloads — servers log analysis, smart grid data, logistics, etc. — relying heavily on lots of sensor- or machine-generated data that can stream right into Hadoop.
Stata acknowledges it won’t be easy trying to win customer away from established Hadoop vendors such as Cloudera, MapR and Hortonworks (which many of Stata’s former Yahoo comrades founded), but, he told me a few months ago, he thinks its very doable. That’s because no matter how easy they make it to manage Hadoop, there’s a class of customers that’s just better served with a cloud service rather than trying to scale their operations staff and energy bill along with their Hadoop cluster.
“Self-managed Hadoop, essentially, is [those vendors’] ultimate goal,” Stata said. “Our goal is to to just take on the management responsibility, to take on all those management things the Yahoos and Googles do under the covers and just run Hadoop as a managed service. The winds of change are in our favor.”
If you want to hear more about where Hadoop is head, stop by our Structure conference next week, where I’ll be discussing that topic with Google Fellow and MapReduce creator Jeff Dean. Other webscale speakers include Facebook VP of Engineering Jay Parikh, Box VP of Engineering Sam Schillace and Amazon CTO Werner Vogels.