Ex-Yahoo CTO on the future of Hadoop and the evolution of search

Raymie Stata has been a champion of Hadoop since its early days, helping to bring the technology into Yahoo and then drive its usage across the company. Now, Stata is co-founder and CEO of Altiscale, a startup offering Hadoop as a service to companies fed up with managing their own big data infrastructure. He came on the Structure Show podcast this week to discuss why Hadoop matters, where it’s headed, and whether it really matters where a company’s engineers cut their teeth.

Here are some highlights from the interview, but it’s worth listening to in its entirety for anyone interested in hearing the details about how Altiscale approaches Hadoop and what it was like growing up as the son of semiconductor legend (and now venture capitalistRay Stata, founder of Analog Devices. Anyone interested in where any of these areas are headed should also attend our Structure conference next month in San Francisco, where we’ll have engineering gurus from Google, Facebook, Amazon, Twitter and more talking about the new computing architectures underpinning their most-pressing workloads.

[soundcloud url=”https://api.soundcloud.com/tracks/150706569″ params=”color=ff5500&auto_play=false&hide_related=false&show_artwork=true” width=”100%” height=”166″ iframe=”true” /]

Download This Episode 

Subscribe in iTunes

The Structure Show RSS Feed

Why Hadoop is best consumed as a service

“In our sales process, we have what we call the teaching moment,” Stata explained. “At some point, somebody will say, ‘Well, how many nodes do I get?’ And that’s when we say, ‘OK, great. Think about it. Why do you care?’ You shouldn’t be thinking about the number of nodes. That’s what it means to have Hadoop as a service.”

Rather, Altiscale bills users monthly based on the amount of computing time and storage they typically consume each month, not dissimilar from a cellular provider with rollover minutes.

Hortonworks and Cloudera aren’t arguing over nothing

“When you think about a market [as big as Hadoop will be], first of all, there won’t be one vendor, there will be multiple ones,” Stata said. “And, second of all, they’re going to be fiercely competitive.”

Still, he noted, the back and forth between companies such as Cloudera, Hortonworks, MapR and Pivotal might actually be muted compared with the state of affairs during the advent of relational databases. But there’s one big difference: “I think ages ago, in the SQL wars, it was primarily kind of the field — sales and marketing — that would get in these kind of bloody battles,” Stata theorized. “With the open source element of Hadoop, I think that has brought the competition to the engineering level.”

And, he continued, real-world, at-scale engineering experience does matter:

“I think the trial by fire that you get in those environments where Hadoop is operating at scale is very valuable for people contributing to the Hadoop code base. … There’s a realization that theory and practice often diverge, and as systems scale up and become more complicated, that happens more and more. And so taking a bit more a data-drvien approach to making improvements — versus just ‘Hey, I’ve got an idea’ and hacking away for days at a time and then contributing it and saying, ‘Hey, isn’t this better?’ — there’s just a certain respect, if you will, for the complexity of the system.”

stata yahoo

Raymie Stata (second from right) relaxing at a Yahoo hackathon. Source: Yahoo / Yodel Anecdotal

Why everyone loves Spark

If adoption by more and more users and software vendors — including those selling Hadoop — is any indication, Spark might someday replace MapReduce as the de facto processing framework on Hadoop clusters. Stata explained why:

“From a performance perspective, because it’s an in-memory solution, obviously, it’s a lot faster than old-style MapReduce, where after every iteration of your algorithm you have to write out to HDFS. With Spark, you’re just kind of updating in place, in memory, so you can do many, many iterations of your iterative algorithms very, very quickly. Those kind of iterative algorithms are very common in machine learning.”

Applications are still the linchpin for big data adoption

Stata did acknowledge that there’s still a “solutions gap” when it comes to Hadoop, citing a dearth of industry- or task-specific products based on the platform:

“We distinguish what we call applications from tools. Tools are still horizontal. A Platfora, a Tableau — those are still horizontal tools. They certainly raise the level of abstraction versus Java, but they don’t have any what we call domain specificity. So an application to us is something that actually solves a domain-specific problem. … When you say ‘attribution analysis,’ that’s where that domain-specificity comes in that that’s to me what qualifies as a real solution.”

Stata (far left) at Structure Europe 2013.

Stata (far left) at Structure Europe 2013.

Innovating search in the mold of Moore’s law

Stata, who once served as Yahoo’s chief architect for search and advertising, has been involved with the web search business for a long time. And, he said, it’s evolving at a pace that’s probably similar to the advances in microprocessors under Moore’s law:

“There’s continuous innovation and more-disruptive innovation, and I think that continuous innovation is real innovation and often is the most scientific. If you think about Moore’s law and what it took to kind of maintain Moore’s law, that’s continuous innovation, but it’s deep, deep work and has been enormously important. So, when I look at classic [algorithmic] search, I would put it in that category of continuous innovation. I think there’s probably some metric where they’re doubling it every 18 to 24 months — some metric of relevance — and maintaing that level of improvement is important because there’s more and more noise out there, so you have to crank up capabilities. But at the same time, it just kind of fades into the background because it just kind of changes.”

However, he went on to explain in some detail, “I do think that if you back up, if you look at the bigger picture of people wanting to inform themselves for various purposes, it does seem like there are more disruptive innovations waiting to happen.”

It’s hard to make computer science a family business

Stata does credit his father for inspiring an entrepreneurial spirit, but doesn’t necessarily think he was destined — or trained — to be a technologist himself. He did eventually work (well, intern) with his dad at Analog Devices, but that was after he was old enough to have caught the computer science bug on his own.

“It’s not like a restaurant, where from 4 years old on you’re inside doing that,” Stata said about technology. “The thing about high-tech is that you can’t really participate until you’re fairly old. And I’m seeing that with my kids, by the way. I’ve got kids, and at 4 years old they didn’t want to do software testing. Imagine that!”