Getting beyond the cult of big data

Session Name: Why You Should Never Ask, How Is This Better Than Hadoop?
Justin Sheehy Announcer
Hi. I work at Basho. We make Riak, a highly available, scalable database, and we make Riak CS, which is the cloud object-storage system that we open-sourced here yesterday. I get asked all the time to explain How is Riak better than Hadoop? I get asked by really great conferences like this one to give talks on How is Riak better than Hadoop? And its not me. We get asked this all the time. One of the other Basho folks was at another great conference – Strata – recently, and decided to count. Within the first hour of the cocktail reception they were asked by people How is Riak better than Hadoop I think at least 16 times.
And this isnt about Riak. Doesnt matter what you make. If you make a highly available scalable database, if you make a database with an entirely different set of priorities, if you make a coordination system thats nothing like a database, if you make a programming framework – it really doesnt matter. If you make software today that does anything with data – which is everything – a lot of people will start their conversation with you by saying: How is this better than Hadoop?”
If youre the one asking that question, you might not realize that when you started the conversation with that question, I learned a lot about you. Because you just gave me an anti-shibboleth. If you dont know what that is, a shibboleth is a word that, if you say it the right way, at the right time, in the right place someone knows that youre in their group. That you know things. That youre one of a certain set of people. Starting a conversation with a question like How is that better than Hadoop? is the opposite of a shibboleth. It tells me that youre definitely not in the crowd of people who has any idea what the answer would mean.
It tells me more than that. It tells me that youre probably looking to make your technology decisions in a very dangerous way: in a cargo-culting way. When I say cargo-culting I mean something very specific. In World War Two, American armed forces used South Pacific Islands as way stations, and the people that lived there saw things theyd never seen before. For instance, they saw that when a man wearing a uniform walked into the middle of a field and waved his arms in the air with sticks in his hands, a plane would drop cargo with food and clothing and wonderful things. And to this day, even after John from America left, you can go to those islands and every day a manll walk out in the middle of a field and wave his arms in the air, praying for cargo. That approach of copying, not the thinking and the strategy and the philosophy of success, but the trappings and the fashion and the details of success, thats cargo-culting. And if you choose your technology in that way, youre going backwards.
You might say, weve already moved past this, weve already realized, now were talking about Big Data. Were not just cargo-culting this tool, weve moved on. I have two responses to that. One of them is that thats not true – we havent moved past this. For just one of so many pieces of evidence, later today at this conference theres a session called Database Technologies That are not Hadoop. Theres some good stuff in that session, but the fact that we had to name it that way tells us we havent stopped moving in this particular cargo-cult.
But if we have, and weve moved on to Big Data, thats worse. Because at least Hadoop is something. Its a pretty good something actually. Its a pretty fantastic set of tools. But I was reading an article recently, an interview with Jim Whitehurst whos the CEO of Red Hat – fantastically smart guy – and he said something that the reporter summarized as: ” The real challenge now is to use Big Data for everything.” That tells me that theres nothing there.
Vendors have started to pick up on this cargo-cult approach. This is what Im trying to help you with. If youre talking to a vendor, theyve realized that you – they can be very smart, and if youre someone that says that youre working on your Big Data strategy, they can make you feel guilty if you havent done that yet. If youre not a member of the cargo-cult yet. And if you tell a vendor of almost anything that youre working on your Big Data strategy, youve made them very happy. Thats because youve told them two things. Youve told them that you have no idea what your own actual business requirements are, and youre ready to spend money. This is fantastic if I have something to sell you.
We can back up and try to ask questions a little better. Harper Reed who ran the technology team for President Obamas campaign said something that goes in the right direction: ” Its not about Big Data, its about big answers.” But you have to go farther back than that. Because big answers dont exist in the abstract. Instead of thinking about Big Data or even big answers, you have to start asking questions, and they have to be questions about yourself, not about Big Data or something equally amorphous. All the interesting questions you can get answers to are about you.
This is why you dont need a Big Data strategy – you cant have a Big Data strategy. You can have a strategy about your company and your needs. Besides, what would it mean to have a Big Data strategy? One of the better analysts out there, at OReilly, and hes a conference chair, – Edd Dumbill – about a year ago came up with a definition, trying to be concise. He said Big Data is data that exceeds the processing capacity of conventional database systems. Which says that Big Data means you need a bigger database. Thats not something you can have a strategy about – at all.
About a year went by and he got better. He changed his definition and said, no, its not about being that big, its about smart use of data. Its about having data – I think you all do – and its about not being stupid. Still not a strategy. Im not making fun of Edd here. I think highly of Edd. Any of the definitions out there couldve worked. I couldve talked about how many Vs are in your data, and it still wouldnt mean anything to have a strategy about that. Anything at all. You could think Im just picking on the word and you could say, yeah, ok, Big Data, maybe you dont have a definition thats about a thing, but thats not the point. Because you could say that about a lot of words that have good things happening in them. Clouds kind of silly, but lots of great things are getting built. NoSQL – one of the silliest terms. But if you say youre working on your NoSQL strategy, I do have a database to sell you. All of these things, yeah you could say that, but the problem isnt that the words are a little amorphous.
The problem – well, I have to admit I lied to you a second ago. Edds definition isnt the best one. Edds is the best one Ive seen from an analyst. The best definition of Big Data Ive seen yet is from Danah Boyd at Microsoft Research. The reason its a better definition is that it points out that this is a cultural phenomenon as much as it is anything else, and that theres absolutely a lot of great technology being built, and theres a lot of important analysis being done. But even more than that, theres a heck of a lot of mythology and lot of people going out and looking at the tools someone else chose and waving sticks in the air instead of having a strategy. That mythology is how you can have these cargo-cults.
Speaking of that cultish behavior, one thing that is even worse to hear than How is this better than Hadoop? or Im working on my Big Data strategy that I do hear from time to time – and it amazes me that I hear it earnestly and not ironically – is someone will talk about any of those words I flashed by, whether its cloud, or NoSQL, or Big Data, and say: ” Oh yeah, on that topic we really drank the Kool-Aid.” If you do that, its even worse than the How is this better than Hadoop? Youve actually explicitly self-identified with the most canonical example of fatal gullibility, if you dont know what it means to have drank the Kool-Aid. So dont worry about your Big Data strategy. Its not a thing. Having a strategy about those buzzwords wont get you anywhere. Your tools – things like Hadoop, Riak, anything else – can be fantastic tools, can be good choices. But those are how you do things. Strategy is not about how you do things. Strategy is about why you do things. And why you do things is about you. So if you stop asking How is this better than Hadoop?, if you stop working on your strategy about something that doesnt exist, and instead you start asking questions about why your business should take a given course of action or not, then you have a chance of not being a cargo-cultist and starting to be a strategist. Thank you.
Alright, were going to keep rolling with the Hadoop – because we love Hadoop, especially