Voices in AI – Episode 42: A Conversation with Jem Davies

In this episode, Byron and Jem discuss machine learning, privacy, ethics, and Moore’s law.
[podcast_player name=”Episode 42: A Conversation with Jem Davies” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-04-12-(00-50-45)-jem-davies.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/04/voices-headshot-card-3.jpg”]
Byron Reese: Hello, this is “Voices in AI,” brought to you by GigaOm, I am Byron Reese. Today my guest is Jem Davies, he is a VP and a Fellow and the GM of the Machine Learning Group at ARM. ARM, as you know, makes processors. They have, in fact, 90–95% of the share in mobile devices. I think they’ve shipped something like 125 billion processors. They’re shipping 20 billion a year, which means you, listener, probably bought three or four or five of them this year alone. With that in mind, we’re very proud to have Jem here. Welcome to the show, Jem.
Jem Davies: Thank you very much indeed. Thanks for asking me on.
Tell me, if I did buy four or five of your processors, where are they all? Mobile devices I mentioned. Are they in my cell phone, my clock radio? Are they in my smart light bulb? Where in the world have you secreted them?
It’s simplest, honestly, to answer that question with where they are not. Because of our position in the business, we sell the design of our processor to a chip manufacturer who makes the silicon chips who then sell those on to a device manufacturer who makes the device. We are a long way away from the public. We do absolutely have a brand, but it’s not a customer brand that people are aware of. We’re a business-to-business style of business, so we’re in all sorts of things that people have no idea about, and that’s kind of okay by us. We don’t try and get too famous or too above ourselves. We like the other people taking quite a lot of the limelight. So, yeah, all of the devices you mentioned. We’ll actually probably even be inside your laptop, just not the big processor that you know and love. We might be in one of the little processors perhaps controlling, oh, I don’t know, the flash memory or the Bluetooth or the modem if it’s an LTE-connected device. But, yes, your smartwatch, your car, your disc drives, your home wireless router, I could go on until you got seriously bored.
Tell me this. I understand that some of the advances we’ve made in artificial intelligence recently are because we’ve gotten better at chip design, we do parallelism better—that’s why GPUs do so well is because they can do parallel processing and so forth—but most people when they think of machine learning, they’re thinking about software that does all these things. They think about neural nets and back propagation and clustering and classification problems and regression and all of that. Tell me why ARM has a Machine Learning Group or is that wrong that machine learning is not just primarily a software thing once you have kind of a basic hardware in place?
Oh, there are about three questions there. See if I count to three. The first is the ways in which you can do machine learning are many and varied. The ways even that these things are implemented are quite disparate. Some people, for example, believe in neuromorphic hardware designs, spiking networks, that sort of thing. The predominant use of neural nets is as software, as you say. They are software emulations of a neural network which then runs on some sort of compute device.
I’m going to take issue with your first question, which was it’s all about Moore’s Law. Actually, two things have happened recently which have changed the uptake. The first is, yeah, there is lots and lots of compute power about, particularly in devices but, also, the ready access to vast quantities of data contained in the environments in which people do the training. And perhaps here I should start by saying that we view training and inference as computationally completely separate problems. So, what we do at ARM is we do computing. What does computing get done on? It gets done on processors, so we design processors, and we try to understand, to analyze—performance analyze, measure bottlenecks, etc.—the way in which a particular compute workload runs on a processor.
For example, originally we didn’t make GPUs—graphics processors—but along comes a time in which everybody needs a certain amount of graphics performance. And whilst it is a digital world, it is all just ones and zeroes, you would never do graphics on a CPU. It doesn’t make sense because of the performance and the efficiency requirements. So we are all the time analyzing these workloads and saying, “Well, what can we do to make our general-purpose CPUs better at executing these workloads, or what is the point at which we feel that the benefits of producing a domain-specific processor outweigh the disadvantages?”
So with graphics it’s obvious. Along comes programmable graphics, and, so, right, you absolutely need a special-purpose processor to do this. Video was an interesting case in point, digital video. MPEG-2 with VGA resolution, not very high frame rate, actually you can do that on a CPU, particularly decode. Along comes the newer standards, much higher resolution, much higher frame rate, and suddenly you go, oh, there is no way we can do this on a CPU. It’s just too hard, it takes too much power, produces too much heat. So we produced a special-purpose video processor which does encode and decode the modern standards.
So, for us, in that regard, machine learning neural network processors are in a sense just the latest workload. Now, when I say “just” you could hear me wave my hands around and put inverted commas around it, because we believe that it is a genuinely once-in-a-generation inflection in computing. The reason for that is practically every time somebody takes a classical method and says, “Oh, I wonder what happens if I try doing this using some sort of machine learning algorithm instead,” they get better results. And, so, if you think of a sort of pie chart and say, well, the total number of compute cycles spent is 100%, what slice of that pie is spent executing machine learning, then we see the slice of the pie that gets spent executing machine learning workload, particularly inference, to be growing and growing and growing, and we think it will be a very significant fraction in a few years’ time.
And one of the things, as I said, that 125 billion chips, is all of these devices at the edge. Yes, there are people doing machine learning today in data centers, and typically training is done next to these vast quantities of training data which tends to exist in hyper-scale data centers, but the inference of the machine learning is most useful when done right next to the test data. And if, for example, you’re trying to recognize things in video, computer vision, something like that, the chances are that camera is out there in the wild. It’s not actually directly connected to your hyper-scale data center.
And so we see an absolute explosion of machine learning inference moving to the edge, and there are very sound reasons for that. Yes, it’s next to the data that you’re trying to test, but it’s the laws of economics, it’s the laws of physics and the laws of the land. Physics says there isn’t enough bandwidth in the world to transmit your video image up to Seattle and have it interpreted and then send the results back. You would physically break the internet. There just isn’t enough bandwidth. And there are cost implications with that as well, as well as the power costs. The cost implications are huge. Google themselves said if everybody used their Android Voice Assistant for three minutes per day then they would have to double the number of data centers they had. That’s huge. That is a lot of money. And we’re used to user experience latency issues, which obviously would come into play, but at the point at which you’re saying, well, actually, rather than identifying the picture of the great spotted woodpecker on my cell phone, I’m actually trying to identify a pedestrian in front of a fast-moving car, that latency issue suddenly becomes a critical reliability issue, and you really don’t want to be sending it remotely.
And then, finally, privacy and security, the laws of the land—people are becoming increasingly reluctant to have their personal data spread all over the internet and rightfully so. So if I can have my personal data interpreted on my device, and if I really care I just have to smash my device to smithereens with a hammer, and I know full well that that data is then safe, then I feel much more comfortable, I feel much more confident about committing my data to that service and getting the benefit of it, whatever that service is. I can’t now remember what your three questions were, but I think I’ve addressed them.
Absolutely. So machine learning, I guess, at its core is let’s take a bunch of this data—which, as you said, our ability to collect it has gone up faster, arguably than Moore’s Law—let’s take a bunch of data about the past, let’s study it, and let’s project that into the future. What do you think, practically speaking, are the limits of that? At the far edge eventually in theory could you point a generalized learner at the internet and then it could write Harry Potter? Where does it break down? We all know kind of the use cases where it excels, but where do you think it’s unclear how you would apply that methodology to a problem set?
Whilst I said that almost every time anybody applies a machine learning algorithm to something they get better results, I think—I’ll use “creative” for want of a better phrase—where the creative arts are concerned, I think there is the hardest fit there. Personally, I have great doubts about whether we have indeed created something intelligent or whether we are, in fact, creating very useful automatons. There have been occasions where they have created music and they have created books, but it tends to be rather pastiche creations or very much along a genre. Personally, I have not yet seen any evidence to suggest that we are in danger of a truly sentient, intelligent creation producing something new.
It’s interesting that you would say we are in danger of that, not we are excited about that.
Oh, sorry. No, that is just my vocabulary.
Fair enough.
I’m not in general very afraid of these things.
Fair enough. So I would tend to agree with you about creativity. And I agree, you can study Bach and make something that sounds passably like it, you can auto-generate sports stories and all of that, and I don’t think any of it makes the grade as being, “creative.” And that’s, of course, a challenge, because not only does intelligence not have a consensus definition, but creativity even less so.
If people had to hold out one example of a machine being creative right now, given today, 2018, they might say Game 3 of the Go tournament between AlphaGo and Lee Sedol, move 37, where he’s in the middle of this game, the computer makes Move 37, and all the live commentators are like, “What?” And the Deep Mind team is scrambling to figure out, like, what was this move? And they look, and AlphaGo said the chances a human player would make that move are about 1 in 10,000. So it was clearly not a move that a human would have made. And, then, as they’ve taken that system and trained it on itself to play itself in games over and over and it plays things like chess, its moves are described as alien chess, because they’re not trained on human moves. Without necessarily knowing a lot of the particulars, would you say that is nascent creativity or is that something that simply looks like creativity, it’s emulating creativity but it isn’t really creativity, or is there a difference between those two ideas?
Very personally, I don’t call that creativity. I just call that exploring a wider search space. We are creatures very much of habit, of cultural norms. There are just things we don’t do and don’t think about doing, and once you produce a machine to do something it’s not bound by any of those. It will learn certainly from your training data, and it will say, “Okay, these are things that I know to work,” but, also, it has that big search space to execute in, to try out. Effectively most machine learning programs when used in the wild for real like that are the results of lots and lots and lots of simulation and experimentation having gone on before, and it will have observed, for example, that playing what we would call “alien” moves are actually a very good strategy when playing against humans.
Fair enough.
And they tend to lose.
Right. So, am I hearing you correctly that you are saying that the narrow AI we have now, which we still have lots to go on and it can do all kinds of amazing things, may be something fundamentally different than general intelligence, that it isn’t an evolutionary path to a general intelligence, but that the general intelligence only shares that one word but is a completely different technology? Am I hearing that correctly or not?
Yes, I think you’re largely hearing it correctly. For someone who makes a living out of predicting technological strategy, I’m actually rather conservative as to how far out I make predictions, and people who talk knowledgeably about what will happen in 10-20 years’ time, I think on the whole, are either braver, or cleverer at making it up than I am, because I think we can see a path from where we are today to really quite amazing things, but I wouldn’t classify them as true intelligence or truly creative.
So, one concern—as you’re building all these chips and they’re going in all these devices—
we’ve had this kind of duel between the black hats and white hats in the computer world making viruses and attacking things, and then they find a vulnerability, and then it’s patched, and then they find another one, and then that’s countered and so forth. There’s a broad concern that the kind of IoT devices that we’re embedding, for instance, your chips in, aren’t upgradeable, and they’re manufactured in great numbers, and so when a vulnerability is found there is no counter to it. On your worry-o-meter how high does that rate, and is that an intractable problem, and how might it be solved in the future?
Security in end devices is something that ARM has taken very seriously, and we published a security manifesto last year where being able to upgrade things and download the latest security fixes and so on was a part of. So we do care about this. It’s a problem that exists whether or not we put machine learning intelligence, machine learning capabilities into those end devices. The biggest problem probably for most people’s homes at the moment is their broadband router, and that’s got no ML capability in it. It’s just routing packets. So it’s a problem we need to address, come what may.
The addition of machine learning capabilities in these and other devices actually, I think, gives us the possibility for considerably more safety and security, because a machine learning program can be trained to spot anomalous activity. So just as if I write a check for £50,000 my bank is very, very likely to ring me up—sorry, for the younger audiences who don’t know what a check is, we’ll explain that later—but it would be anomalous, and they would say, “Okay, that’s not on, that’s unusual.” Similarly, we can do that in real time using machine learning monitoring systems to analyze network data and say, “Well, actually, that looks wrong. I don’t believe he meant to do that.” So, in general, I’m an optimist that the machine learning revolution will help us more than hinder us here.
That raises another point. That same system that said that check was not good is probably looking at a bunch of variables: your history of all of the checks you’ve written in the past, who it was made payable to, where it was, what time of day. There are all these different data inputs, and it makes some conclusion that yea or nay, flag this, don’t flag it. When that same methodology is applied to an auto loan or a home loan or so forth and it says, “Give them the loan, don’t give them the loan,” European law says that the person is entitled to an explanation why it said that. Is that fair, and is that a hindrance to systems where you might look at it and say, well, we don’t know; it flagged it because it looks like other ones that were fraudulent, and beyond that we can’t offer a lot of insight? What are your thoughts on that?
I think this is an absolute minefield, and I’m not going to give you a very sensible answer on this. It is clear that a number of people implementing such systems will want to keep the decision-making process a secret, because that is actually their trade secret. That is their commercial secret sauce. And so actually opening these boxes up and saying, well, it decided to do this because of X, Y and Z, is something that they are not going to want to do.
Equally, with some machine learning systems that are based on learning rather than based on if-then-else rules-based systems, it’s going to be genuinely hard to answer that question. If somebody rings up and says, “Why did you do that?” It is going to be genuinely hard for that service provider, even if they wanted to, to answer that question.
Now, that to me, as a technologist, just answering what is and is not physically possible/hard. Me as a consumer, yes, I want to know. If somebody says, “Well, I think you’re a bad risk,” or “Actually, in life insurance terms I think you’re going to die tomorrow,” I really want to know the answers to those questions, and I think I’ve got a right to be informed about that sort of thing. So, I’m sorry, I’m deeply conflicted on that one.
As I think everyone is. That’s kind of the challenge. It’s interesting to see how it’s going to play out.
On a different note entirely, a lot of the debate around AI and machine learning is around automation and its effect on employment, and, roughly speaking, there are kind of three positions. There is the idea that it’s going to eliminate a bunch of “low-skilled jobs” and you’re going to have some level of unemployment that persists long-term because there just are more people than there are low-skilled jobs. Then there is another camp which says no, no, no, they’re going to be able to do everything, they’ll write better poetry, and they’ll paint better paintings, which it sounds like you’re not part of that camp. And then there is this third camp that says no, no, no, like any technology it fundamentally increases productivity, it empowers people, and people use it to drive higher wages, and it creates more jobs in the future. We saw it with steam and then the assembly line and even with the internet just 25 years ago. What is your thought? How do you think artificial intelligence and machine learning and automation are going to impact employment?
On a global scale, I tend towards your latter view, which is that actually it tends to be productive rather than restrictive. I think that on a local scale, however, the effects can be severe, and I’m of the view that the people it’s likely to affect are not necessarily the ones that people expect. For example, I think that we are going to have to come to terms with understanding, in more detail, the difference between a highly-skilled occupation and a highly-knowledged occupation. So, if we look at what machine learning can do with a smartphone and a camera and an internet connection in terms of skin cancer diagnosis, it arguably puts skin cancer diagnosticians out of a job, which is a bit surprising to most people, because they would regard them as very highly skilled, very highly educated. Typically, somebody in that situation would probably have ten years of postgraduate experience let alone all their education that got them to that point. We see cab drivers and truck drivers being at risk. And yet actually the man who digs a hole in the road and fixes a broken sewer pipe might well have a job, because actually that’s extremely hard to automate.
So I think people’s expectations of who wins and who loses in this procedure are going to be probably somewhat misguided, but I think, yeah, some jobs are clearly at great risk, and the macro-economy might well benefit from some macro-economic trends here, but, as one of your presidents said, the unemployment rate is either 0 percent or 100 percent, depending on your point of view. You’ve either got a job or you haven’t. And so I do think this does bring considerable risks of societal change, but then actually society has always changed, and we’ve gone through many a change that has had such effects. On the whole, I’m an optimist.
So in the U.S. at least, our unemployment rate has stayed between 5% and 10% for 250 years with the exception of the Depression. Britain is not the same exact range obviously but a similar relatively tight band in spite of enormous technologies that have come along like steam power, electricity, even the internet and so forth.
I think both of us have probably exploited such big changes as they’ve been coming along.
Right. And real wages have clearly risen over that 250-year period as well, and we’ve seen, like you just said, jobs eliminated. I think the half-life of the group of jobs that everybody collectively has right now is probably 50 years. I think in any 50-year period about half of them are lost. It was farming jobs at one point, manufacturing jobs at one point and so forth. Do you have a sense that machine learning is more of the same or is something profoundly different?
I’m reluctant to say it’s something different. I think it’s one of the bigger ones, definitely, but actually steam engines were pretty big, coal was pretty big, the invention of the steam train. These were all pretty significant events, and so I’m reluctant to say that it’s necessarily bigger than those. I think it is at least a once-in-a-generation inflection. It’s at least that big.
Let’s talk a little bit about human ability versus machines. So let me set you up with a problem, which is if you take a million photos of a cat and a million photos of a dog and you train the machine learning thing, it gets reliable at telling the difference between the two. And then the narrative goes: and yet, interestingly, a person can be trained on a sample size of one thing. You make some whimsical stuffed animal of some creature that doesn’t exist, you show it to a person and say, “Find it in all these photos,” and they can find it if it’s frozen in a block of ice or covered in chocolate syrup or half-torn or what have you. And the normal explanation for that is, well, that’s transfer learning, and humans have a lifetime of experience with other things that are torn or covered in substances and so forth, and they are able to, therefore, transfer that learning and so forth.
I used to be fine with that, but recently I got to thinking about children. You could show a child not a million cats but a dozen cats or however many they’re likely to encounter in their life up until age five, and then you can be out for a walk with them, and you see one of those Manx cats, and they say, “Look, a cat with no tail,” even though there’s this class of things, cats, they all have tails, and that’s a cat with no tail. How do you think humans are doing that? Is that innate or instinctual or what? That should be a level we can get machines to under your view, isn’t it?
On the one hand I’ll say that a profound area of research which is proving to produce huge results is the way in which we can now train neural networks using much smaller sets of data. There is a whole field of research going on there which is proving to be very productive. Against that, I’ll advance you that we have no idea how that child learns, and so I refuse to speculate about the difference between A and B when I have actually no understanding of A.
And I don’t wish to be difficult about this, but neuroscientists, applied psychologists combined, there is some deep understanding of biochemistry at the synapse level, and we can extrapolate some broad observed behaviors which make it appear as though we know how people learn, but there are enough counter-examples to show that we simply don’t understand this properly. Neuroscience is being researched and developed just as quickly as machine learning, and they need to make a lot of progress about understanding how the brain works in reality. Up until that point I must admit where my colleagues, particularly those in the marketing department, start talking about machine learning reflecting how the brain works, I get itchy and scratchy, and I try to stop them.
I would agree. Don’t you even think that neural nets, even the appeal to that metaphor is forced?
Yes, I dislike it. If I had my way I would refer to neural networks as something else, but it’s pointless, because everybody would be saying, “What? Oh, you mean a neural network.” That ship has sailed. I’m not picking that fight. I do try and keep us on the subject of machine learning when we speak publicly as opposed to artificial intelligence. I think I might be able to win that one.
That’s interesting. So is your problem with the word ‘artificial’, the word ‘intelligence’ or both?
My problem is the word ‘intelligence’ when combined with ‘artificial’ which implies I have artificially created something that is intelligent, and I know what intelligence is, and I’ve created this artificial thing which is intelligent. And I’m going, well, you kind of don’t know what intelligence is, you kind of don’t know what learning really is, and so making a claim that you’ve been able to duplicate this, physically create it in some manmade system, it’s a bit wide of the mark.
I would tend to agree, but there interestingly isn’t a consensus on that interpretation of what artificial means. There are plenty of people who believe that artificial turf is just something that looks like turf but it isn’t, artificial fruit made of wax is just something that looks like fruit but it really isn’t, and therefore artificial intelligence is something that isn’t really intelligent.
Okay. If I heard anyone advance that viewpoint I would be a lot happier with the words “artificial intelligence.”
Fair enough. So would you go so far as to say that people who look at how humans learn and try to figure out, well, how do we apply that in computers, may be similarly misguided? The oft-repeated analogy is we learned to fly not by emulating birds but by making the airfoil. Is that your view, that trying to map these things to the human brain may be more of a distraction than useful?
On the whole, yes, though I think it is a worthwhile pursuit for some section of the scientific community to see if there are genuinely parallels and what we can learn from that, but, in general, I am a pragmatist, I observe that neural network algorithms and particularly the newer kinds of networks are just a generally useful tool, and we can create systems that perform better than classical if-then-else rules-based systems. We can get better results at object recognition, for example, better false positives. They are just generally better, and so I think that’s a worthwhile pursuit, and we can apply that to devices that we use every day to give us a better quality of life. Who hasn’t struggled with the user interface on some wretchedly so-called smart device and uttered the infamous phrase, “What’s it doing now?” because we are completely bewildered by it? We’ve not understood it. It hasn’t understood us. We can transform that, I would argue, by adding more human-like interaction between the real world and the digital world.
So humans have this intelligence, and we have these brains, which you point out we don’t really understand. And then we have something, a mind, which, however you want to think about it, is a set of abilities that don’t seem to be derivable from what we know about the brain, like creativity and so forth. And then we have this other feature which is consciousness, where we actually experience the world instead of simply measuring it. Is it possible that we therefore have capabilities that cannot be duplicated in a computer?
I think so, yes. Until somebody shows me some evidence to the contrary, that’s probably going to be my position. We are capable of holding ethical, moral beliefs that are at variance, often, with our learning of the way things work in the world. We might think it is simply wrong to do something, and we might behave in that way even having seen evidence that people who do that wrong thing gain advantage in this world. I think we’re more than just the sum of our learning experiences. Though what we are, I can’t explain why, sorry.
No, well, you and Plato.
In the same camp there. That’s really interesting, and I, of course, don’t mean it to diminish anything that we are going to be able to do with these technologies.
No, I genuinely think we can do amazing things with these technologies, even if it can’t write Shakespeare.
When the debate comes up about the application of this technology, let’s say it’s used in weapon systems to make automated kill decisions, which some people will do, no matter what—I guess a landmine is an artificial intelligence that makes a kill decision based on the weight of an object, so in a sense it’s not new—but do you worry, and you don’t even have to go that extreme, that somehow the ethical implications of the action can attempt to be transferred to the machine, and you say, well, the machine made that call, not a person? In reality, of course, a person coded it, but is it a way for humans to shirk moral responsibility for what they build the machines to do?
All of the above. So it can be a way for people to shirk responsibility for what they do, but, equally, we have the capability to create technologies, tools, devices that have bad consequences, and we always have done. Since the Bronze Age—arguably since the Stone Age—we’ve been able to create axes which were really good at bringing down saber-toothed tigers to eat, but they were also quite useful at breaking human skulls open. So we’ve had this all along, you know, the invention of gunpowder, the discovery of atomic energy, leading to both good and bad.
Technology and science will always create things that are morally neutral. It is people who will use them in certain ways that may have good or bad morality is my personal view. But, yes, I think it does introduce the possibility for less well-controlled things. And it can be much less scary. It may not be automated killing by drone. It may be car ADAS systems, the traditional, sort of, I’ve got to swerve one way or the other, I am unable to stop, and if I swerve that way I kill a pensioner, if I go that way I kill a mother and baby.
Right, the trolley problem.
Yeah, it is the trolley problem. Exactly, it is the trolley problem.
The trolley problem, if you push it to the logical extreme of things that might actually happen, should the AI prevent you from having a second helping of dessert, because that statistically increases, you know? Should it prohibit you from having the celebratory cigar after something?
Let’s talk about hardware for a moment. Every year or so, I see a headline that says, “Is it the end of Moore’s Law?” And I have noticed in my life that any headline phrased as a question, the answer is always, no. Otherwise that would be the headline: “Moore’s Law is over.”
“Moore is dead.”
Exactly, so it’s always got to be no. So my question to you is are we nearing the end of Moore’s Law? And Part B of the same question is what are physical constraints—I’ve heard you talk about you start with the amount of heat it can dissipate, then you work backward to wattage and then all of that—what are the fundamental physical laws that you are running up against as we make better, smaller, faster, lower-power chips?
Moore’s Law is, of course, not what most people think it was. He didn’t actually say most of the things that most people have attributed to him. And in some sense it is dead already, but in a wider applicability sense, if you sort of defocus the question and step out to a further altitude, we are finding ways to get more and more capabilities out of the same area of silicon year on year, and the introduction of domain-specific processors, like machine learning processors, is very much a feature of that. So I can get done in my machine learning processor at 2 mm2what it might take 40 mm2of some other type of processor.
All of technology development has always been along those lines. Where we can find a more efficient way to do something, we generally do, and there are generally useful benefits either in terms of use cases that people want to pay for or in terms of economies where it’s actually a cheaper way of providing a particular piece of functionality. So in that regard I am optimistic. If you were talking to one of my colleagues who works very much on the future of silicon processors, he’d probably be much more bleak about it, saying, “Oh, this is getting really, really hard, and it’s indistinguishable from science fiction, and I can count the number of atoms on a transistor now, and that’s all going to end in tears.” And then you say, well, okay, maybe silicon gets replaced by something else, maybe it’s quantum computing, maybe it’s photonics. There are often technologies in the wings waiting to supplant a technology that’s run out of steam.
So, your point taken about the misunderstanding of Moore’s Law, but Kurzweil’s broader observation that there’s a power curve, an exponential curve, about the cost to do some number of calculations that he believes has been going on for 130 years across five technologies—it started with mechanical computers, then to relays, then to tubes, then to transistors, and then to the processors we have today—do you accept some variant of that? That somehow on a predictable basis the power of computers as an abstraction is doubling?
Maybe not doubling every whatever it used to be, 18 months or something like that, but through the use of things like special-purpose processors like ARM is producing to run machine learning, then, yeah, actually, we kind of do. Because when you move to something like a special-purpose processor that is, oh, I don’t know, 10X, 20X, 50X more efficient than the previous way of doing something, then you get back some more gradient in the curve. The curve might have been flattening off, and then suddenly you get a steepness increase in the curve.
And then you mentioned quantum computing. Is that something that ARM is thinking about and looking at, or is it so far away from the application to my smart hammer that it’s—?
Yeah, it’s something we look at, but, to be honest, we don’t look at it very hard, because it is still such a long way off. It’s probably not going to bother me much, but there are enough smart people throwing enough money at the problem that if it is fixable, somebody will, particularly with governments and cryptography behind it. There are such national security gains to be made from solving this problem that the money supply is effectively infinite. Quantum computing is not being held back by lack of investment, trust me.
So, final question, I’m curious where you come down on the net of everything. On the one hand you have this technology and all of its potential impact, all of its areas of abuse and privacy and security and war and automation, well, that’s not abuse, but you have all of these kind of concerns, and then you have all of these hopes—it increases productivity, and helps us solve all these intractable problems of humanity and so forth. Where are you net on everything? And I know you don’t predict 20 years out, but do you predict directionally, like I think it’s going to net out on the plus side or the minus side?
I think it nets out on the plus side but only once people start taking security and privacy issues seriously. At the moment it’s seen as something of an optional extra, and people producing really quite dumb devices at the moment like, oh, I don’t know, radiator valves, say, “Oh, it’s nothing to do with me. Who cares? I’m just a radiator valve manufacturer.” And you say, well, yeah, actually, but if I can determine from Vladivostok that your radiators are all programmed to come on at this time of day, and you switch the lights on, and you switch the lights off at this time of day, I’ve just inferred something really quite important about your lifestyle.
And so I think that getting security and privacy to be taken seriously by everybody who produces smart devices, particularly where those devices start to become connected and forming sort of islands of privacy and security, such that you go, “Okay, well, I’m prepared to have this information shared amongst the radiator valves in my house, I’m prepared to share it with my central heating system, I’m not prepared to send it to my electricity company,” or something like that, intersecting rings of security, and people only have the right to see the information they need to see, and people will care about this stuff and control it sensibly.
And you might have to delegate that trust. You might have to delegate it to your manufacturer of home electronics. You can say, okay, well, they’re a reputable name, I trust them, I’ll buy them, because clearly most people can’t be experts in this area, but, as I say, I think people have to care first, at which point they’ll pay for it, at which point the manufacturers will supply it and compete with each other to do it well.
All right. I want to thank you so much for a wide-ranging hour-long discussion about all of these topics, and thank you for your time.
Thank you very much. It was fun.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.

Voices in AI – Episode 30: A Conversation with Robert Mittendorff and Mudit Garg

In this episode, Byron, Robert and Mudit talk about Qventus, healthcare, machine learning, AGI, consciousness, and medical AI.
[podcast_player name=”Episode 30 – A Conversation with Robert Mittendorff and Mudit Garg” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-01-22-(00-58-58)-garg-and-mittendorf.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/01/voices-headshot-card-3.jpg”]
Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. Today is a first for Voices in AI, we have two guests. The first one is from Qventus; his name is Mudit Garg. He’s here with Robert Mittendorff, who’s with Norwest Venture Partners, who also serves on Qventus’ board. Mudit Garg is the co-founder and CEO of Qventus, and they are a company that offers artificial-intelligence-based software designed to simplify hospital operations. He’s founded multiple technology companies before Qventus, including Hive, a group messaging platform. He spent two years as a consultant with Seattle-based McKinsey & Company, focusing, I think, on hospital operations.
Robert, from Norwest Ventures, before he was VP of Marketing and Business Development at Hansen Medical, a publicly traded NASDAQ company. He’s also a board-certified emergency physician who completed his residency training at Stanford. He received his MD from Harvard Medical School, his MBA from Harvard Business School, and he has a BS in Biomedical Engineering from Johns Hopkins University. Welcome to the show, gentlemen.
Mudit Gard: Thank you. Good morning. Thank you for having us.
Robert Mittendorff: Thank you, Byron.
Mudit, I’ll start with you. Tell us about Qventus and its mission. Get us all oriented with why we’re here today.
Mudit: Absolutely. The best way to think of Qventus, our customers often describe us like air traffic control. Much like what air traffic control does for airports, where it allows many flights to land, much more than if they were uncoordinated, and much more safely than if they were uncoordinated. We do the same for healthcare and hospitals.
For me—as, kind of, boring and uncool as a world of operations and processes might be—I had a chance to see that firsthand working in hospitals when I was at McKinsey & Company, and really just felt that we were letting all of our clinicians down. If you think about the US healthcare system, we have the best clinicians in the world, we have great therapies, great equipment, but we fail at providing great medicine. Much of that was being held back by the complex operations that surround the delivery of care.
I got really excited about using data and using AI to help support these frontline clinicians in improving the core delivery of care in the operation. Things like, as a patient sitting in an emergency department, you might wonder what’s going on and why you aren’t being taken care of faster. On the flip side, there’s a set of clinicians who are putting in heroic efforts trying to do that, but they are managing so many different variables and processes simultaneously that it’s almost humanly impossible to do that.
So, our system observes and anticipates problems like, it’s the Monday after Thanksgiving, it’s really cold outside, Dr. Smith is working, he tends to order more labs, our labs are slow—all these factors that would be hard for someone to keep in front of them all the time. When it realizes we might run out of capacity, three or four hours in advance, they will look and find the bottleneck, and create a discussion on how to fix that. We do things like that at about forty to fifty hospitals across the country, and have seen good outcomes through that. That’s what we do, and that’s been my focus in the application of AI.
And Robert how did you get involved with Qventus?
Robert: Well, so Qventus was a company that fit within a theme that we had been looking at for quite some time in artificial intelligence and machine learning, as it applies to healthcare. And within that search we found this amazing company that was founded by a brilliant team of engineers/business leaders who had a particular set of insights from their work with hospitals, at McKinsey, and it identified a problem set that was very tractable for machine learning and narrow AI which we’ll get into. So, within that context in the Bay Area, we found Qventus and we’re just delighted to meet the team and their customers, and really find a way to make a bet in this space.
We’re always interested in case studies. We’re really interested in how people are applying artificial intelligence. Today, in the here and now, put a little flesh on the bones of what are you doing, what’s real and here, how did you build it, what technology you are using, what did you learn? Just give us a little bit of that kind of perspective.
Mudit: Absolutely. I’ll first start with the kinds of things that we are doing, and then we’ll go into how did we build it, and some of the lessons along the way as well. I just gave you one example of running an emergency department. In today’s world, there is a charge nurse that is responsible for managing the flow of patients through that emergency department, constantly trying to stay ahead of it. The example I gave was where, instead the systems are observing it, realizing, learning from it, and then creating a discussion among folks about how to change it.
We have many different things—we call them recipes internally—many different recipes that the system keeps looking for. It looks for, “Hey, here’s a female who is younger, who is waiting and there are four other people waiting around her, and is an acute pain.” She is likely to get up and leave without being seen by a doctor much more than other folks, and you might nudge and greet her, to go up and talk to them. We have many recipes and examples like these, I won’t go into specific examples in each of those, but we do that in different areas of delivery of healthcare.
So, patient flow, just having patients go through the health systems in ways that don’t require them to add resources, but allow them to provide the same care is one big category. You do that in the emergency department, in unison to the hospital and in the operating room. More recently, starting to do that in pharmacy operations, pharmacy costs have started rising. What are the things that today require a human to manually realize, follow up on, escalate and manage, and how can we help the AIs with that process? We’ve seen really good results with that.
I think you’re asking about case studies, in the emergency department side alone, one of our customers treated three thousand more patients in that ED this year than last, without adding resources. They saved almost a million minutes of patient wait time in that single ED alone and that’s been fascinating. What’s been even more amazing is hearing from the nurse manager there how the staff feel like they have the ability to shape the events versus always being behind, and always feeling like they are trying to solve the problem after the fact. They’ve seen some reductions in turnover and that ability of using AI to, in some ways, making health care more human for the people who help us, the caregivers, is what’s extremely exciting in this work for me.
Just to visualize that for a moment, if I looked at it from thirty thousand feet—people come into a hospital, all different ways, and they have all different characteristics of all the things you would normally think, and then there’s a number of routings through the hospital experience, right? Rush them straight into here, or there, or this, so it’s kind of a routing problem. It’s a resource allocation problem, right? What does all of that look like? This is not a rhetorical question, what is all that similar to outside of the hospital? Where is that approach broadly and generally applicable to? It’s not a traffic routing problem, it’s not an inventory management problem, are there any corollaries you can think of?
Mudit: Yeah. In many ways there are similarities to anywhere where there are high fixed asset businesses and there’s a distributed workforce, there’s lots of similarities. I mean, logistics is a good example of it. Thinking about how different deliveries are routed and how they are organized in a way that you meet the SLAs for different folks, but your cost of delivery is not too high. It has similarities to it.
I think hospitals are, in many ways, one of the most complex businesses, and given the variability is much, much higher, traditional methods have failed. In many of the other such logistical and management problems you could use your optimization techniques, and you could do fairly well with them. But given the level of variability is much, much higher in healthcare—because the patients that walk in are different, you might have a ton walk in one day and very few walk in the next, the types of resources they need can vary quite a bit—that makes the traditional methods alone much, much harder to apply. In many ways, the problems are similar, right? How do you place the most product in a warehouse to make sure that deliveries are happening as fast as possible? How do you make sure you route flights and cancel flights in a way that causes minimum disruption but still maximize the benefit of the entirety of the system? How do you manage the delivery of packages across a busy holiday season? Those problems have very similar elements to them and the importance of doing those well is probably similar in some ways, but the techniques needed are different.
Robert, I want to get to you in just a minute, and talk about how you as a physician see this, but I have a couple more technical questions. There’s an emergency room near my house that has a big billboard and it has on there the number of minutes of wait time to get into the ER. And I don’t know, I’ve always wondered is the idea that people drive by and think, “Oh, only a four-minute wait, I’ll go to the ER.” But, in any case, two questions, one, you said that there’s somebody who’s in acute pain and they’ve got four people, and they might get up and leave, and we should send a greeter over… In that example, how is that data acquired about that person? Is that done with cameras, or is that a human entering the information—how is data acquisition happening? And then, second, what was your training set to use AI on this process, how did you get an initial training set?
Mudit: Both great questions. Much of this is part of the first-mile problem for AI in healthcare, that much of that data is actually already generated. About six or seven years ago a mass wave of digitization started in healthcare, and most of the digitization was taking existing paper-based processes and having them run through electronic medical record systems.
So, what happens is when you walk into the emergency department, let’s say, Byron, you walk in, someone would say, “Okay, what’s your name? What are you here for?” They type your name in, and a timestamp is stored alongside that, and we can use that timestamp to realize a person’s walked in. We know that they walked in for this reason. When you got assigned a room or assigned a doctor then I can, again, get a sense of, okay, at this time they got assigned a room, at this time they got assigned a doctor, at this time their blood was drawn. All of that is getting stored in existing systems of record already, and we take the data from the systems of record, learn historically—so before we start we are able to learn historically—and then in the moment, we’re able to intervene when a change needs to take place.
And then the data acquisition part of the acute patient’s pain?
Mudit: The pain in that example is actually coming from the kind of what they have complained about.
I see, perfect.
Mudit: So, we’re looking the types of patients who complain about similar pieces, what’s their likelihood versus this likelihood, that’s what we will be learning on it.
Robert, I have to ask you before we dive into this, I’m just really intensely curious about your personal journey, because I’m guessing you began planning to be a medical practitioner, and then somewhere along the way you decided to get an MBA, and then somewhere along the way you decided to invest in technology companies and be on their boards. How did all of that happen? What was your progressive realization that took you from place to place to place?
Robert: I’ll spend just a couple of minutes on it, but not exactly. I would say in my heart I am an engineer. I started out as an engineer. I did biomedical electrical engineering and then I spent time at MIT when I was a medical student. I was in a very technical program between Harvard and MIT as a medical student. In my heart, I’m an engineer which means I try to reduce reality to systems of practice and methods. And coupled with that is my interest in mission-driven organizations that also make money, so that’s where healthcare and engineering intersect.
Not to go into too much detail on a podcast about myself, I think the next step in my career was to try to figure out how I could deeply understand the needs of healthcare, so that I could help others and myself bring to bear technology to solve and address those needs. The choice to become a practitioner was partially because I do enjoy solving problems in the emergency department, but also because it gave me a broad understanding of opportunities in healthcare at the ground level and above in this way.
I’ll just give you an example, when I first saw what Mudit and his team had done in the most amazing way at Qventus, I really understood the hospital as an airport with fifty percent of the planes landing on schedule. So, to go back to your emergency department example, imagine if you were responsible for safety and efficiency at SFO, San Francisco airport, without a tower and knowing only the schedule landing times for half of the jets, where each jet is patient. Of the volume of patients that spend their night in the hospital, about half come to the ED, and when I show up for a shift that first, second, and third patient can be stroke, heart attack, broken leg, can be shortness of breath, skin rash, etcetera. The level of complexity in health care to operationalize improvements in the way that Mudit has is incredibly high. We’re just at the beginning, they are clearly the leader here, but what I saw in my personal journey in this company is the usage of significant technology to address key throughput needs in healthcare.
When one stack-ranks what we hope artificial intelligence does for the world, on most people’s list, right up there at the very top is impact health. Do you think that’s overly hyped because there’s all kinds of, you know, we have an unending series of wishes that we hope artificial intelligence can do? Do you think it’s possible that it delivers eventually on all of that, that it really is a transformative technology that materially alters human health at a global level?
Robert: Absolutely and wholeheartedly. My background as a researcher in neuroscience was using neural networks to model brain function in various animal models, and I would tell you that the variety of ways that machine learning and AI, which are the terms we use now for these technologies, the variety of ways they will affect human health are massive. I would say within the Gartner hype cycle we are early, we are overhyping in the short term the value of this technology. We are not overhyping the value of this technology in the next ten, twenty, or thirty years. I believe that AI is the driver of our Industrial Revolution. This will be looked back at as an industrial revolution of sorts. I think there’s a huge benefit that are going to be accrued to healthcare providers and patients to the usage of these technologies.
Talk about that a little more, paint a picture of the world in thirty years, assuming all goes well. Assuming all goes well, what would our health experience look like in that world?
Robert: Yeah, well, hopefully your health experience, and I think Mudit’s done a great job describing this, will return to a human experience between a patient and a physician, or provider. I think in the backroom, or when you’re at home interacting with that practice, I think you’re going to see a lot more AI.
Let me give you one example. We have a company that went public, a digital health company, that uses machine learning to read EKG data, so cardiac electrical activity data. A typical human would take eight hours to read a single study on a patient, but by using machine learning they get down to five to tens of minutes. The human is still there, overreading what the machine learned software is producing—this company is called iRhythm—and what that allows us to do is reach a lot more patients at a lower cost than you could achieve with human labor. You’ll see this in radiology. You’ll see this in coaching patients. You’ll see this in where I think Mudit has really innovated, which is he has created a platform that is enabling.
In the case that I gave you with humans being augmented by, what I call, the automation or semi-automation of a human task, that’s one thing, but what Mudit is doing is truly enabling AI. Humans cannot do what he does in the time and scale that he does it. That is what’s really exciting—machines that can do things that humans cannot do. Just to visualize that system, there are some things that are not easily understood today, but I think you will see radiology improve with semi-automation. I think patients will be coached with smart AI to improve their well-being, and that’s already being seen today. Human providers will have leverage because the computer, the machine will help prioritize their day, which patient talk to about, what, when, how, why. So, I think you’ll see a more human experience.
That’s the concern is that we will see a more manufactured experience. I don’t think that’s the case at all. The design that we’ll probably see succeed is one where the human will become front and center again, where physicians will no longer be looking at screens typing in data, they’ll be communicating face to face with a human, with an AI helping out, advising, enabling those tedious tasks that the human shouldn’t be burdened with, to allow the relationship between the patient and physician to return.
So, Mudit, when you think of artificial intelligence and applying artificial intelligence to this particular problem, where do you go from that? Is the plan to take that learning—and, obviously, scale it out to more hospitals—but what is the next level to add depth to it to be able to say, “Okay, we can land all the planes now safely, now we want to refuel them faster, or…”? I don’t know, the analogy breaks down at some point. Where would you go from here?
Mudit: We already as customers are starting to see results of this approach in one area. We’ve started expanding already and have a lot more expansion coming down the line as well. If you think of it, at the end of the day, so much of healthcare delivery is heavily process driven, right? Anywhere from how your bills get generated to when you get calls. I’ve had times when I might get a call from a health system saying I have a ten-dollar bill that they are about to send to collection but I paid all the bills today. There are things like that that are constantly happening that are breakdowns in processes, across delivery, across the board.
We started, as I said, four or five years ago and very specifically focused on the emergency department. Going from there into the surgery area, where operating rooms can cost upwards of hundreds of dollars a minute, so how do you manage that complex an operation, and the logistics setting to deliver the best value? And I’ve seen really good results there, managing the entirety of all the units in the hospital. More recently, as I was saying, we are now starting to work with Sutter Health across twenty-six of their hospital pharmacies, in looking at what are the key pieces around operations in the pharmacy which are, again, manually holding people back from delivering the best care. These are the different pieces across the board that we are already starting to see.
The common thread across all of these I find is that we have amazing, incredible clinicians today, that, if they had all the time and energy in the world to focus on anticipating these problems and delivering the best care, they would do a great job, but we cannot afford to keep having more people solve these problems. There are significant margin pressures across healthcare. The same people who were able to do these things before have to-do lists that are growing faster than they can ever comprehend. The job of AI really is to act as, kind of, their assistant and watch those decisions on their behalf, and help make those really, really easy. To take all of the boring, mundane logistics out of their hands, so they can focus on what they can do best which is deliver care to their patients. So, right now, as I said, we started on the flow side, pharmacies are a new area, outpatient clinics, and imaging centers is another area that we are working with a few select customers on and there’s some really, really exciting stuff there in increasing the access to care—when you might call a physician to get access—while reducing the burden on that physician, that we are working on.
Another really exciting piece for me is, in many ways the US healthcare system is unique, but in this complexity of logistics and operation it is not. So, we are already signed to work with hospitals globally, just started with working with our first international customer recently, and the same problems exist everywhere. There was an article in BBC, I think a week or two weeks ago, where there’s a long surgery waiting lists in the UK, and they are struggling to get those patients seen in that system, due to lack of efficiency in these logistics. So, that’s the other piece that I’m really excited about, it’s not only the breadth of these problems where there’s complexity of processes, but also the global applicability of it.
The exciting thing to me about this episode of Voices is that I have two people who are engineers, who understand AI, and who have a deep knowledge of health. I just have several questions that kind of sit at the intersection of all of that I would love to throw at you.
My first one is this, the human genome is, however many billions of base pairs that works out to something like 762MB of data, but if you look at what makes us different than, say, chimps, it may be one percent of that. So, it’s something like 7MB or 8MB of data is the code you need to build an intelligent brain, a person. Does that imply to you that artificial intelligence might have a breakthrough, there might be a relatively straightforward and simple thing about intelligence that we’re going to learn, that will supercharge it? Or, is your view that, no, unfortunately, something like a general intelligence is going to be, you know, hunks of spaghetti code that kind of work together and pull off this AGI thing. Mudit, I’ll ask you first.
Mudit: Yeah, and boy that’s a tough question. I will do my best in answering that one. Do I believe that we’ll be able to get a general-purpose AI, with, like, 7MB or 8MB of code? There’s a part of me that does believe in that simplicity, and does want to believe in that the answer. If you look at a lot of the machine learning code, it’s not the code itself that’s actually that complex, it’s the first mile and the last mile of that code that ends up taking the vast majority of the code. How to get the training sets in and how do you get the output out—that is what takes the majority of the AI code today.
The fundamental learning code isn’t that big today. I don’t know if we’ll solve general purpose AI anytime soon. I’m certainly not holding my breath for that, but there’s a part of me that feels and hopes that the fundamental concepts of the learning and the intelligence, will not be that complicated at an individual micro scale. Much like ourselves, we’ll be able to understand them, and there will be some beauty and harmony and symphony in how they all come together. And that actually won’t be complex in hindsight, but it will be extremely complex to figure out the first time around. That’s purely speculative but that would be might be my belief and my hunch right now.
Robert, do you want to add anything to that, or let that answer stand?
Robert: I’d be happy to. I think it’s an interesting analogy to make. There are some parts of it that will break down and parts that will parallel between the human genomes complexity, and utility, and the human brain. You know, just I think when we think about the genome you’re right, it’s several billion base pairs where we only have twenty thousand genes, and a small minority percentage that actually code for protein, and a minority of those that we understand affect the human in a diseased way, like a thousand genes to two thousand genes. There’s a lot of base pairs that we don’t understand and could be related to structure of the genome as it needs to do what it does in the human body, in the cell.
On the brain side, though, I think I would go with your latter response which is if you look at the human brain—and I’ve had the privilege of working with animal models and looking at human data—the brain is segmented into various functional units. For example, the auditory cortex is responsible for taking information from the ear and converting it to signals that then are pattern-recognized in to, say, language, and where those symbols of what words we’re speaking are then processed by other parts of the cortex. Similarly, the hippocampus, which sits in, kind of, the oldest part of the brain, is responsible for learning. It is able to look at various inputs from all of these, from the visual and auditory and other courtesies, and then upload them to long-term memory from short-term memory, so that the brain is functionally segmented and physically segmented.
I believe that a general-purpose AI will have the same kind of structure. It’s funny we have this thing called the AI effect where when we solve a problem with code or with machinery, it’s no longer AI. So, for example, natural language processing, some would consider now not part of AI because we’ve somewhat solved it, or speech recognition used to be AI, but now it’s an input to the AI, because the AI is thinking about more understanding than interpretation of audio signals and converting them into words. I would say what we’re going to see, which is similar to the human body encoded by these twenty thousand genes, is you will have functional expertise with, presumably, code that is used for segmenting the problem of creating a general AI.
A second question then. You, Robert, waxed earlier about how big the possibilities are for using artificial intelligence with health. Of course, we know that the number of people who are living to one hundred keeps going up, up, up. The number of people who become supercentenarians is in the dozens, who’ve gotten to one hundred and ten. The number of people who have lived to one hundred and twenty-five is stubbornly fixed at zero. Do you believe—and not even getting aspirational about “curing death”—that what’s most likely to happen is more of us are going to make it to one hundred healthily, or do you think that one hundred and twenty-five is something we’ll break and maybe somebody will live to one hundred and fifty. What do you think about that?
Robert: That’s a really hard question. I would say that if I look at the trajectory of gains that, public health, primarily, with things like treated water to medicine, we’ve seen a dramatic increase in human longevity in the developed world. From taking down the number of children dying during childbirth, which lowers the average obviously, to extending life in the later years, and if you look at the effects there those conclusions have never effects on society. For example, when Social Security was invented a minority of individuals would live to the age in which they would start accruing significant benefits, obviously that’s no longer the case.
So, to answer your question, there is no theoretical reason that I can come up with that I can’t imagine someone making it to one hundred and twenty-five. One hundred and fifty is obviously harder to imagine. But we understand the human cell at a certain level, and the genome, and the machinery of the human body, and we’ve been able to thwart the body’s effort to fatigue and expire, a number of times now. Whether it’s cardiovascular disease or cancer, and we’ve studied longevity—“we” meaning the field, not myself—so, I don’t see any reason why we would say we will not have individuals reach one hundred and twenty-five, or even one hundred and fifty.
Now, what is the time course of that? Do we want that to happen and what are the implications for society? Those are big questions to answer. But science will continue to push the limits of understanding human function at the cellular and the physiologic level to extend the human life. And I don’t see a limit to that currently.
So, there is this worm, called the nematode worm, little bitty fella, he’s as long as a hair is wide, the most successful animal on the planet. Something like seventy percent of all animals are nematode worms. The brain of the nematode worm has 302 neurons, and for twenty years or so, people have been trying to model those 302 neurons in a computer, the OpenWorm project. And even today they don’t know if they can do it. That’s how little we understand. We don’t not understand the human brain because it’s so complex, we don’t understand anything—or I don’t want to say anything—we don’t understand just how neurons themselves work.  
Do you think that, one, we need to understand how our brains work—or how the nematode brain works for that matter—to make strides towards an AGI? And, second, is it possible that a neuron has stuff going on at the Planck level that it’s as complicated as a supercomputer, making intelligence acquired that way incredibly difficult? Do either of you want to comment on that?
Mudit: It’s funny that you mention that, when I was at Stanford doing some work in the engineering, one of the professors used to say that our study of the human brain is sort of like someone just had a supercomputer and two electrodes and they’re poking the electrodes in different places and trying to figure out how it works. And I can’t imagine ever figuring out how a computer works outside-in by just having like two electrodes and seeing the different voltages coming out of it. So, I do see the complexity of it.
Is it necessary for us to understand how the neuron works? I’m not sure it’s necessary for us to understand how the neuron works, but if you were to come up with a way where we can build a system that’s, both resilient, redundant, and simple, that can do that level of intelligence, I think that’s hundreds of thousands of years of evolution that have helped us get to that solution, so it would, I think, be a critical input.
Without that, I see a different approach, which is what we are taking today, which is inspired, likely, but it’s not the same. In our brain when neurons fire, yes, we now have a similar transfer function for many of our neural networks of how the neuron fires, but for any kind of meaningful signal to come out we have a population of neurons firing in our brain that makes the impulsing more continuous and very redundant and very resilient. It wouldn’t fail even if some portion of those neurons stopped working. But that’s not how our models work, that’s not how our math works today. I think in finding the most optimized, probably, elegant and resilient way of doing it, I think it would be remiss not to take inspiration from what has been evolved over a long, long period of time, to, perhaps, being one of the most efficient ways of having general purpose AI. So, at least my belief would be we will have to learn, and I would think that our understanding is still largely simplistic and, at least, I would hope and believe that we’ll learn a lot more and find out that, yeah, each one of those perhaps either communicates more, or does it in a way that brings the system to the optimal solution a lot faster than we would imagine.
Robert: Just to add to that I would say, I agree with everything Mudit said, I would say do we need to study the neuron and neural networks in vivo, in animals? And the answer to that is, as humans, we do. I mean, I believe that we have an innate curiosity to understand ourselves and that we need to do. Whether it’s funded or not, the curiosity to understand who we are, where we came from, how we work, will drive that just like it’s driven fields as diverse as astronomy to aviation.
I think, do we need to understand at the level of detail you’re describing, for example, what exactly happens at the synapse stochastically, where neurotransmitters find the receptors that open ion channels that change the resting potential of a neuron, such that additional axonal effects occur where at the end of that neuron you then release another neurotransmitter. I don’t think so. Because I think we learn a lot, as Mudit said, from understanding how these highly developed and trained systems we call, animals and humans, work, but they were molded over large periods of time for specific survival tasks, to live in the environment that they live in.
The systems we’re building, or Mudit’s building, and others, are designed for other uses, and so we can take, as he said, inspiration from them, but we don’t need to model how a nematode thinks to help the hospital work more effectively. In the same way that, there are two ways, for example, someone could fly from here in San Francisco, where I’m sitting, to, let’s say, Los Angeles. You could be a bird, which is a highly evolved flying creature which has sensors, which has, clearly, neural networks that are able to control wing movement, and effectively the wing surface area to create lift, etcetera. Or, you could build a metal tube with jets on it that gets you there as well. I think they have different use cases and different criteria.
The airplane is inspired by birds. The wing of an airplane, the cross-section of the wing is designed like a bird’s wing is in that the one pathway is longer than the other which changes pressure above and below the wing that allows flight to occur. But clearly, the rest of it is very different. And so, I think the inspiration drove aviation to a solution that has many parts from what birds have, but it’s incredibly different because the solution was to the problem of transporting humans.
Mudit, earlier you said we’re not going to have an AGI anytime soon. I have two questions to follow up on that thought. The first is that among people who are in the tech space there’s a range of something like five to five hundred years as to when we might get a general intelligence. I’m curious, one, why do you think there’s such a range? And, two, I’m curious, with both of you, if you were going to throw a dart at that dartboard, where would you place your bet, to mix a metaphor.
Mudit: I think in the dart metaphor, chances of being right are pretty low, but we’ll give it a shot. I think part of it, at least I ask myself, is the bar we hold for AGI too high? At what point do we start feeling that a collection of special-purpose AIs that are welded together can start feeling like an AGI and is that good enough? I don’t know the answer to that question and I think that’s part of what makes the answer harder. Similar to what Robert was saying where the more problems we solve, the more we see them as algorithmic and less as AI.
But I do think at some point, at least in my mind, if I can see an AI starting to question the constraints of the problem and the goal it’s trying to maximize, that’s where true creativity for humans comes from; when we break rules and when we don’t follow the rules we were given. And that’s also the scary part of AI comes from because it can do that at scale then. I don’t see us close to that today. And if I had to guess I’m going to just say, on this exponential curve, I’m going to probably not pick out the right point, but four to five decades is when we start seeing enough of the framework and maybe essentially, we can see some tangible general-purpose AI come to form.
Robert, do you want to weigh in, or you will take a pass on that one?
Robert: I’ll weigh in quickly. I think we often see this in all of investing, actually—whether it’s augmented reality, virtual reality, whether it’s stenting or robotics in medicine—we as investors have to work hard to not overestimate the effect of technology now, and not underestimate the effect of technology in the long run. This came from, I believe a Stanford professor Roy Amara, who unfortunately passed a while ago, but that idea of saying, “Let’s not overhype it, but it’s going to be much more profound than we can even imagine today,” puts my estimate, probably—and it depends how you define general AI which is probably not worth doing—I would say it’s within fifteen to twenty years.
We have this brain, the only general intelligence that we know of. And then we have the mind and, kind of, a definition of that which I think everybody can agree to that the mind as a set of abilities that don’t seem, at first glance, to be something an organ could do, like creativity, or a sense of humor. And then we have consciousness, we actually experience the world. A computer to measure temperature, but we can burn our finger and feel that. My questions are, we would expect the computer to have a “mind,” we would expect an AGI to be creative, do you think, one, that consciousness is required for general intelligence, and, to follow up on that, do you believe computers can become conscious? That they can experience the world as opposed to just measure it?
Mudit: That’s a really hard one too. I think actually in my mind what’s most important, and there’s kind of a grey line between the two, is creativity is what’s most important, the element of surprise is what’s most important. The more an AI can surprise you, the more you feel like it is truly intelligent. So, that creativity is extremely important. But I think the reason I said there’s kind of a path from one to the other is—and this is very philosophical of how to define consciousness—in many ways it’s when we start taking a specific task that is given to us, but really start asking the larger objective, the larger purpose, that’s when, I feel like, that’s what truly distinguishes a being or a person being conscious.
Until the AIs are able to be creative and break the bounds of the specific rules, or the specific expected behavior that it’s programmed to do, certainly the path to consciousness is very, very hard. So, I feel like creativity and surprising us is probably the first piece, which is also the one that honestly scares us as humans the most, because that’s when we feel a sense of losing control over the AI. I don’t think true consciousness is necessary, but they might go hand in hand. I can’t think of it being necessary, but they might evolve simultaneously and they might go hand in hand.
Robert: I would just add one other thought there which is, so I spent many hours in college having this debate of what is consciousness, you know, where is the sea of consciousness? Anatomists for centuries have dissected and dissected it, you know, is it this gland, or is it that place, or is it an organized effect of the structure and function of all of these parts. I think that’s why we need to study the brain, to be fair.
One of the underlying efforts there is to understand consciousness. What is it that makes a physical entity able to do what you said, to experience what you said? More than just experiencing a location, experiencing things like love. How could a human do that if they were a machine? Can a machine of empathy?
But I think beyond that, as I think practically as an investor and as a physician, I frankly, I don’t know if I care if the machine is conscious or not, I care more about who do I assign responsibility to for the actions and thoughts of that entity. So, for example, if they make a decision that harms someone, or if they make the wrong diagnosis, what recourse do I have? Consciousness in human beings, well, we believe in free will, and that’s where all of our entities around human justice come from. But if the machine is deterministic, then a higher power, may be the human that designed it, is ultimately responsible. For me, it’s a big question about responsibility with effect to these AIs, and less about whether they’re conscious or not. If they’re conscious then we might be able to assign responsibility to the machine, but then how do we penalize it—financially, otherwise? If they’re not conscious, then we probably need to assign responsibility to the owner, or the person that configured the machine.
I started the question earlier about why is there such a range of beliefs about when we might get a general intelligence, but the other interesting thing, which you’re kind of touching on, is there’s a wide range of belief about whether we would want one. You’ve got the Elon Musk camp of summoning the demon, Professor Hawking saying it’s an existential threat, and Bill Gates said, “I don’t understand why more people aren’t worried about it,” and so forth. And on the other end, you have people like Andrew Ng who said, “That’s like worrying about overpopulation of Mars,” and Rodney Brooks the roboticist, and so forth, who dismissed those. It’s almost eye-rolling, that you can see. What are the core assumptions that those two groups have, and why are they so different from each other in their regard to this technology?
Mudit: To me it boils down to the same things that make me excited about large-scale potential, from a general-purpose side, are the things that make me scared. You know how we were talking about what creativity is, if I go back to creativity for a second. Creativity will come from if an AI is told to maximize an objective function and the objective function has constraints, should it be allowed to question the constraints and the problem itself? If it is allowed to do that that’s where true creativity would come from, right? That’s what a human would do. I might give someone a task or a problem, but then they might come back and question it, and that’s where true creativity will come from. But the minute we allow an AI to do that is also when we lose that sense of control. We also don’t have that sense of control in humans today, but what freaks us out about AI is that AI can take that and do that at very, very rapid scale, at a pace at which we may not even as a society catch up to, realize, and be able to control or regulate, which we can in case of humans. I think that’s both the exciting part and the fear, they are really hand in hand.
The pace at which AI can then bring about the change once those constraints are loosened is something we haven’t seen before. And we already see, in today’s environment, our inability to keep pace with how fast technology is changing, from a regulation, from a framework standpoint as a society. And I think once that happens that will be called into question even more. I think that’s probably why many in the camp of Elon Musk, Sam Altman, and others, in many ways, I think, the part of their ask that resonates with me is we probably should start thinking about how we will tackle the problem, what framework should we have in place earlier, so we have time as a society to wrestle with it before it comes and it’s right in our face.
Robert: I would add to that with four things. I would say the four areas that I think kind of define us a bit—and there were a couple of them that were mentioned by Mudit—I think it’s speed, so speed of computation of affecting the world in which the machine would be in; scalability; the fact that it can affect the physical environment; and the fact that machines as we currently believe them do not have morals or ethics, I don’t know how you define it. So, there’s four things. Something that’s super fast, that’s highly scaled, that can affect the physical world with no ethics or morality, that is a scary thing, right? That is a truck on 101 with a robotic driver that is going to go 100 MPH and doesn’t care what it hits. That’s the scary part of it. But there’s a lot of technology that looks like that. If you are able to design it properly and constrain it, it can be incredibly powerful. It’s just that the conflict in those four areas could be very detrimental to us.
So, to pull the conversation back closer to the here and now, I want to ask each of you what’s a breakthrough in artificial intelligence in the medical profession that we may not have heard about, because there are so many of them? And then tell me something—I’ll put both of you on the spot on this—you think we’re going to see in, like, two or three years; something that’s on a time horizon where we can be very confident we’re going to go see that. Mudit, why don’t you start, what is something we may not know about, and what is something that will happen pretty soon do you think, in AI and medicine?
Mudit: I think—and this might go back to what I was saying—the breakthrough is less in the machine learning itself, but the operationalization of it. The ability—if we have the first mile and the last mile solved—to learn exists, but in the real, complex world of high emotions, messy human-generated data, the ability to actually, not only predict, but, in the moment, prescribe and persuade people to take action, is what I’m most excited about and I’m starting to see happen today, that I think is going to be transformative in the ability of existing machine learning prowess to actually impact our health and our healthcare system. So, that’s the part that I’m most excited about. It may not be, Byron, exactly what you’re looking for in terms of what breakthrough, but I think it’s a breakthrough of a different type. It’s not an algorithmic breakthrough, but it’s an operationalization breakthrough which I’m super excited about.
The part you asked about, what do I think in two to three years we could start doing, that we perhaps don’t do as well now… I know one that is very clear is places where there’s high degrees of structured data that we require humans to pore through—and I know Robert spent a lot of time on this, so I’ll leave this one to him—around radiology, around EKG data, around these huge quantities of structured data that are just impossible to monitor. But the number of poor quality outcomes, mortality, and bad events like that that happen which, if it was humanly feasible to monitor all that and realize, I believe we are two to three years away from starting to meaningfully bend that, both kind of process-wise, logistically, and then from a diagnosis standpoint. And it will be basic stuff, it will be stuff that we have known for a long time that we should do. But, you know, as the classic saying goes, it takes seventeen years from knowing something should be done, to doing it at scale in healthcare; I think it will be that kind of stuff where it will start rapidly shortening and reducing that cycle time and seeing vast effects of that in a healthcare system.
Robert: I’ll give you my two, briefly. I think it’s hard to come up with something that you may not have heard about, Byron, with your background, so I’ll think more about the general audience. First of all, I agree with Mudit, I think the two to three year time frame what’s obvious is that any signal processing in healthcare that is being done by human is going to be rapidly moved to a computer. So, iRhythm as an example of a company trading over a billion in a little over a year out of its IPO does that for cardiology data, EKG data, acquired through a patch. There are over forty companies that we have tracked in the radiology space that are prereading, or in some sense providing a pre-diagnostic read of CTs, MRIs, x-rays, for human radiology overreads for diagnosis. That is happening in the next two to five years. That is absolutely going to happen in the next two to five years. Companies like GE and Philips are leading it, there are lots of startups doing work there.
I think the area that might not be so available to the general public is the usage of machine learning on human conversation. Imagine in therapy, for example, therapy is moving to teletherapy, telemedicine; those are digitized conversations, they can be recorded and translated into language symbols, which can then be evaluated. Computational technology is being developed and is available today that can look at those conversations to decipher whether, for example, someone is anxious today, or depressed, needs more attention, may need a cognitive behavioral therapy intervention that is compatible with their state. And that allows, not only the scaling of signal processing, but the scaling of human labor that is providing psychological therapy to these patients. And so, I think, where we start looking at conversations, this is already being done in the management of sales forces with companies using AI to monitor sales calls and coach sales reps as to how to position things in those calls, to more effectively increase the conversion of a sale, we’re seeing that in healthcare as well.
All right, well that is all very promising, that’s all like kind of lifts up our day to know that there’s stuff coming and it’s going to be here relatively soon. I think that’s probably a good place to leave it. As I look at our timer, we are out of time, but I want to thank both of you for taking the time out of, I’m sure, your very busy days, to have this conversation with us and let us in on a little bit of what you’re thinking, what you’re working on, so thank you.
Mudit: Thank you very much, thanks, Byron.
Robert: You’re welcome.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.

Voices in AI – Episode 29: A Conversation with Hugo LaRochelle

In this episode, Byron and Hugo discuss consciousness, machine learning and more.
[podcast_player name=”Episode 29 – A Conversation with Hugo LaRochelle” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-01-15-(00-49-50)-hugo-larochelle.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/01/voices-headshot-card-2.jpg”]
Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. Today I’m excited; our guest is Hugo Larochelle. He is a research scientist over at Google Brain. That would be enough to say about him to start with, but there’s a whole lot more we can go into. He’s an Associate Professor, on leave presently. He’s an expert on machine learning, and he specializes in deep neural networks in the areas of computer vision and natural language processing. Welcome to the show, Hugo.
Hugo Larochelle: Hi. Thanks for having me.
I’m going to ask you only one, kind of, lead-in question, and then let’s dive in. Would you give people a quick overview, a hierarchical explanation of the various terms that I just used in there? In terms of, what is “machine learning,” and then what are “neural nets” specifically as a subset of that? And what is “deep learning” in relation to that? Can you put all of that into perspective for the listener?
Sure, let me try that. Machine learning is the field in computer science, and in AI, where we are interested in designing algorithms or procedures that allow machines to learn. And this is motivated by the fact that we would like machines to be able to accumulate knowledge in an automatic way, as opposed to another approach which is to just hand-code knowledge into a machine. That’s machine learning, and there are a variety of different approaches for allowing for a machine to learn about the world, to learn about achieving certain tasks.
Within machine learning, there is one approach that is based on artificial neural networks. That approach is more closely inspired from our brains, from real neural networks and real neurons. It is still somewhat vaguely inspired by—in the sense that many of these algorithms probably aren’t close to what real biological neurons are doing—but some of the inspiration for it, I guess, is a lot of people in machine learning, and specifically in deep learning, have this perspective that the brain is really a biological machine. That it is executing some algorithm, and would like to discover what this algorithm is. And so, we try to take inspiration from the way the brain functions in designing our own artificial neural networks, but also take into account how machines work and how they’re different from biological neurons.
There’s the fundamental unit of computation in artificial neural networks, which is this artificial neuron. You can think of it, for instance, that we have neurons that are connected to our retina. And so, on a machine, we’d have a neuron that would be connected to, and take as input, the pixel values of some image on a computer. And in artificial neural networks, for the longest of time, we would have such neural networks with mostly a single layer of these neurons—so multiple neurons trying to detect different patterns in, say, images—and that was the most sophisticated type of artificial neural networks that we could really train with success, say ten years ago or more, with some exceptions. But in the past ten years or so, there’s been development in designing learning algorithms that leverage so called deep neural networks that have many more of these layers of neurons. Much like, in our brain we have a variety of brain regions that are connected with one another. How the light, say, flows in our visual cortex, it flows from the retina to various regions in the visceral cortex. In the past ten years there’s been a lot of success in designing more and more successful learning algorithms that are based on these artificial neural networks with many layers of artificial neurons. And that’s been something I’ve been doing research on for the past ten years now.
You just touched on something interesting, which is this parallel between biology and human intelligence. The human genome is like 725MB, but so much of it we share with plants and other life on this planet. If you look at the part that’s uniquely human, it’s probably 10MB or something. Does that imply to you that you can actually create an AGI, an artificial general intelligence, with as little as 10MB of code if we just knew what that 10MB would look like? Or more precisely, with 10MB of code could you create something that could in turn learn to become an AGI?
Perhaps we can make that parallel. I’m not so much an expert on biology to be able to make a specific statement like that. But I guess in the way I approach research—beyond just looking at the fact that we are intelligent beings and our intelligence is essentially from our brain, and beyond just taking some inspiration from the brain—I mostly drive my research on designing learning algorithms more from math or statistics. Trying to think about what might be a reasonable approach for this or that problem, and how could I potentially implement it with something that looks like an artificial neural network. I’m sure some people have a better-informed opinion as to what extent we can draw a direct inspiration from biology, but beyond just the very high-level inspiration that I just described, what motivates my work and my approach to research is a bit more taking inspiration from math and statistics.
Do you begin with a definition of what you think intelligence is? And if so, how do you define intelligence?
That’s a very good question. There are two schools of thought, at least in terms of thinking of what we want to achieve. There’s one which is we want to somehow reach the closest thing to perfect rationality. And there’s another one which is to just achieve an intelligence that’s comparable to that of human beings, in the sense that, as humans perhaps we wouldn’t really draw a difference between a computer or another person, say, in talking with that machine or in looking at its ability to achieve a specific task.
A lot of machine learning really is based on imitating humans. In the sense that, we collect data, and this data, if it’s labeled, it’s usually produced by another person or committee of persons, like crowd workers. I think those two definitions aren’t incompatible, and it seems the common denominator is essentially a form of computation that isn’t otherwise easily encoded just by writing code yourself.
At the same time, what’s kind of interesting—and perhaps evidence that this notion of intelligence is elusive—is there’s this well-known phenomenon that we call the AI effect, which is that it seems very often whenever we reach a new level of AI achievement, of AI performance for a given task, it doesn’t take a whole lot of time before we start saying that this actually wasn’t AI, but this other new problem that we are now interested in is AI. Chess is a little bit like that. For a long time, people would associate chess playing as a form of intelligence. But once we figured out that we can be pretty good by treating it as, essentially, a tree search procedure, then some people would start saying, “Well that’s not really AI.” There’s now this new separation where chess-playing is not AI anymore, somehow. So, it’s a very tough thing to pin down. Currently, I would say, whenever I’m thinking of AI tasks, a lot of it is essentially matching human performance on some particular task.
Such as the Turing Test. It’s much derided, of course, but do you think there’s any value in it as a benchmark of any kind? Or is it just a glorified party trick when we finally do it? And to your point, that’s not really intelligence either.
No, I think there’s value to that, in the sense that, at the very least, if we define a specific Turing Test for which we currently have no solution, I think it is valuable to try to then succeed in that Turing Test. I think it does have some value.
There are certainly situations where humans can also do other things. So, arguably, you could say that if someone plays against AlphaGo, but wasn’t initially told if it was AlphaGo or not—though, interestingly, some people have argued it’s using strategies that the best Go players aren’t necessarily considering naturally—you could argue that right now if you played against AlphaGo you would have a hard time determining that this isn’t just some Go expert, at least many people wouldn’t be able to say that. But, of course, AlphaGo doesn’t really classify natural images, or it doesn’t dialog with a person. But still, I would certainly argue that trying to tackle that particular milestone is useful in our scientific endeavor towards more and more intelligent machines.
Isn’t it fascinating that Turing said that, assuming the listeners are familiar with it, it’s basically, “Can you tell if this is a machine or a person you’re talking to over a computer?” And Turing said that if it can fool you thirty percent of the time, we have to say it’s smart. And the first thing you say, well why isn’t it fifty percent? Why isn’t it, kind of, indistinguishable? An answer to that would probably be something like, “Well, we’re not saying that it’s as smart as a human, but it’s intelligent. You have to say it’s intelligent if it can fool people regularly.” But the interesting thing is that if it can ever fool people more than fifty percent, the only conclusion you can draw is that it’s better at being human than we are…or seeming human.
Well definitely that’s a good point. I definitely think that intelligence isn’t a black or white phenomenon, in terms of something is intelligent or isn’t, it’s definitely a spectrum. What it means for someone to fool a human more than actual humans into thinking that they’re human is an interesting thing to think about. I guess I’m not sure we’re really quite there yet, and if we were there then this might just be more like a bug in the evaluation itself. In the sense that, presumably, much like we have now adversarial networks or adversarial examples, so we have methods that can fool a particular test. I guess it just might be more a reflection of that. But yeah, intelligence I think is a spectrum, and I wouldn’t be comfortable trying to pin it down to a specific frontier or barrier that we have to reach before we can say we have achieved actual AI.
To say we’re not quite there yet, that is an exercise in understatement, right? Because I can’t find a single one of these systems that are trying to pass the test that can answer the following question, “What’s bigger, a nickel or the sun?” So, I need four seconds to instantly know. Even the best contests restrict the questions enormously. They try to tilt everything in favor of the machine. The machine can’t even put in a showing. What do you infer from that, that we are so far away?
I think that’s a very good point. And it’s interesting, I think, to talk about how quickly are we progressing towards something that would be indistinguishable from human intelligence—or any other—in the very complete Turing Test type of meaning. I think that what you’re getting at is that we’re getting pretty good at a surprising number of individual tasks, but for something to solve all of them at once, and be very flexible and capable in a more general way, essentially your example shows that we’re quite far from that. So, I do find myself thinking, “Okay, how far are we, do we think?” And often, if you talk to someone who isn’t in machine learning or in AI, that’s often the question they ask, “How far away are we from AIs doing pretty much anything we’re able to do?” And it’s a very difficult thing to predict. So usually what I say is that I don’t know because you would need to predict the future for that.
One bit of information that I feel we don’t often go back to is, if you look at some of the quotes of AI researchers when people were, like now, very excited about the prospect of AI, a lot of these quotes are actually similar to some of the things we hear today. So, knowing this, and noticing that it’s not hard to think of a particular reasoning task where we don’t really have anything that would solve it as easily as we might have thought—I think it just suggests that we still have a fairly long way in terms of a real general AI.
Well let’s talk about that for just a second. Just now you talked about the pitfalls of predicting the future, but if I said, “How long will it be before we get to Mars?” that’s a future question, but it’s answerable. You could say, “Well, rocket technology and…blah, blah, blah…2020 to 2040,” or something like that. But if you ask people who are in this field—at least tangentially in the field—you get answers between five and five hundred years. And so that implies to me that not only do we not know when we’re going to do it, we really don’t know how to build an AGI.  
So, I guess my question is twofold. One, why do you think there is that range? And two, do you think that, whether or not you can predict the time, do you think we have all of the tools in our arsenal that we need to build an AGI? Do you believe that with sufficient advances in algorithms, sufficient advances in processors, with data collection, etcetera, do you think we are on a linear path to achieve an AGI? Or is an AGI going to require some hitherto unimaginable breakthrough? And that’s why you get five to five hundred years because that’s the thing that’s kind of the black swan in the room?
That is my suspicion, that there are at least one and probably many technological breakthroughs—that aren’t just computers getting faster or collecting more data—that are required. One example, which I feel is not so much an issue with compute power, but is much more an issue of, “Okay, we don’t have the right procedure, we don’t have the right algorithms,” is being able to match how as humans we’re able to learn certain concepts with very little, quote unquote, data or human experience. An example that’s often given is if you show me a few pictures of an object, I will probably recognize that same object in many more pictures, just from a few—perhaps just one—photographs of that object. If you show me a picture of a family member and you show me other pictures of your family, I will probably identify that person without you having to tell me more than once. And there are many other things that we’re able to learn from very little feedback.
I don’t think that’s just a matter of throwing existing technology, more computers and more data, at it; I suspect that there are algorithmic components that are missing. One of them might be—and it’s something I’m very interested in right now—learning to learn, or meta-learning. So, essentially, producing learning algorithms from examples of tasks, and, more generally, just having a higher-level perspective of what learning is. Acknowledging that it works on various scales, and that there are a lot of different learning procedures happening in parallel and in intricate ways. And so, determining how these learning processes should act at various scales, I think, is probably a question we’ll need to tackle more and actually find a solution for.
There are people who think that we’re not going to build an AGI until we understand consciousness. That consciousness is this unique ability we have to change focus, and to observe the world a certain way and to experience the world a certain way that gives us these insights. So, I would throw that to you. Do you, A), believe that consciousness is somehow key to human intelligence; and, B), do you think we’ll make a conscious computer?
That’s a very interesting question. I haven’t really wrapped my head around what is consciousness relative to the concept of building an artificial intelligence. It’s a very interesting conversation to have, but I really have no clue, no handle on how to think about that.
I would say, however, that clearly notions of attention, for instance, being able to focus attention on various things or adding an ability to seek information, those are clearly components for which there’s, currently—I guess for attention we have some fairly mature solutions which work, thought in somewhat restrictive ways and not in the more general way; information seeking, I think, is still very much related to the notion of exploration and reinforcement learning—still a very big technical challenge that we need to address.
So, some of these aspects of our consciousness, I think, are kind of procedural, and we will need to figure out some algorithm to implement these, or learn to extract these behaviors from experience and from data.
You talked a little bit earlier about learning from just a little bit of data, that we’re really good at that. Is that, do you think, an example of humans being good at unsupervised learning? Because obviously as kids you learn, “This is a dog, and this is a cat,” and that’s supervised learning. But what you were talking about, was, “Now I can recognize it in low light, I can recognize it from behind, I can recognize it at a distance.” Is that humans doing a kind of unsupervised learning? Maybe start off by just explaining the concept and the hope about unsupervised learning, that it takes us, maybe, out of the process. And then, do you think humans are good at that?
I guess, unsupervised learning is, by definition, something that’s not supervised learning. It’s kind of an extreme of not using supervised learning. An example of that would be—and this is something I investigated quite a bit when I did my PhD ten years ago—to have a procedure, a learning algorithm, that can, for instance, look at images of hundreds of characters and be able to understand that each of these pixels in these images of characters are related. That they are higher-level concepts that explain why this is a digit. For instance, there is the concept of pen strokes; a character is really a combination of pen strokes. So, unsupervised learning would try to—just from looking at images, from the fact that there are correlations between these pixels, that they tend to look like something different than just a random image, and that pixels arrange themselves in a very specific way compared to any random combination of pixels—be able to extract these higher-level concepts like pen stroke and handwritten characters. In a more complex, natural scene this would be identifying the different objects without someone having to label each object. Because really what explains what I’m seeing is that there’s a few different objects with a particular light interacting with the scene and so on.
That’s something that I’ve looked at quite a bit, and I do think that humans are doing some form of that. But also, we’re, probably as infants, we’re interacting with our world and we’re exploring it and we’re being curious. And that starts being something a bit further away from just pure unsupervised learning and a bit closer to things like our reinforcement learning. So, this notion that I can actually manipulate my environment, and from this I can learn what are its properties, what are the facts and the variations that characterize this environment?
And there’s an even more supervised type of learning that we see in ourselves as infants that is not really captured by purely supervised learning, which is being able to exchange or to learn from feedback from another person. So, we might imitate someone, and that would be closer to supervised learning, but we might instead get feedback that’s worded. So, if a parent says do this or don’t do that, this isn’t exactly an imitation this is more like a communication of how you should adjust your behavior. And this is a form of weakly supervised learning. So, if I tell my kid to do his or her homework, or if I give instructions on how to solve a particular problem set, this isn’t a demonstration, so this isn’t supervised learning. This is more like a weak form of supervised learning. Which even then I think we don’t use as much in the known systems that work well currently that people are using in object recognition systems or machine translation systems and so on. And so, I believe that these various forms of learning that are much less supervised than the common supervised learning is a direction in research where we still have a lot of progress to make.
So earlier you were talking about meta learning, which is learning how to learn, and I think there’s been a wide range of views about how artificial intelligence and an AGI might work. And on one side was an early hope that, like the physical universe which is governed just by very few laws, and magnetism very few laws, electricity very few laws, we hoped that intelligence was governed by just a very few laws that we could learn. And then on the other extreme you have people like the late Marvin Minsky who really saw the brain as a hack of a couple of hundred narrow AIs, that all come together and give us, if not a general intelligence at least a really good substitute for one. I guess a belief in meta learning is a belief in the former case, or something like it, that there is a way to learn how to learn. There’s a way to build all those hacks. Would you agree? Do you think that?
We can take one example there. I think under a somewhat general definition of what learning to learn or meta learning is, it’s something that we could all agree exists, which is, as humans, we’re the result of years of evolution. And evolution is a form of adaptation, I guess. But then within our lifespan, each individual will also adapt to its specific human experience. So, you can think of evolution as being kind of like the meta learning to the learning that we do as humans in our individual lives every day. But then even in our own lives, I think there are clearly ways in which my brain is adapting as I’m growing older from a baby to an adult, that are not conscious. There are ways in which I’m adapting in a rational way, in conscious ways, which rely on the fact that my brain has adapted to be able to perceive my environment—my visual cortex just maturing. So again, there are multiple layers of learning that rely on each other. And so, I think this is, at a fairly high level, but I think in a meaningful way, a form of meta learning. For that reason, I think that investigating how to have learning of learning systems is that there is a process that’s valuable here in informing how to have more intelligent agents and AIs.
There’s a lot of fear wrapped up in the media coverage of artificial intelligence. And not even getting into killer robots, just the effects that it’s going to have on jobs and employment. Do you share that? And what is your prognosis for the future? Is AI in the end going to increase human productivity like all other technologies have done, or is AI something profoundly different that’s going to harm humans?
That’s a good question. What I can say is that I am motivated by—and what makes me excited about AI—is that I see it as an opportunity of automating parts of my day-to-day life which I would rather be automated so I can spend my life doing more creative things, or the things that I’m more passionate about or more interested in. I think largely because of that, I see AI as a wonderful piece of technology for humanity. I see benefits in terms of better machine translation which will better connect the different parts of the world and allow us to travel and learn about other cultures. Or how I can automate the work of certain health workers so that they can spend more time on the harder cases that probably don’t receive as much attention as they should.
For that reason—and because I’m personally motivated automating these aspects of life which we would want to see automated—I am fairly optimistic about the prospects for our society to have more AI. And, potentially, when it comes to jobs we can even imagine automating our ability to progress professionally. Definitely there’s a lot of opportunities in automating part of the process of learning in a course. We now have many courses online. Even myself when I was teaching, I was putting a lot of material on YouTube to allow for people to learn.
Essentially, I identified that the day-to-day teaching that I was doing in my job was very repetitive. It was something that I could record once and for all and instead focus my attention on spending time with the student and making sure that each individual student solves its own misunderstanding about the topic. Because my mental model of the student in general is that it’s often unpredictable how they will misunderstand a particular aspect of the course. And so, you actually want to spend some time interacting with that student, and you want to do that with as many students as possible. I think that’s an example where we can think of automating particular aspects of education so as to support our ability to have everyone be educated and be able to have a meaningful professional life. So, I’m overall optimistic, largely because of the way I see myself using AI and developing AI in the future.
Anybody who’s listened to many episodes of the show will know I’m very sympathetic to that position. I think it’s easy to point to history and say in the last two hundred and fifty years, other than the depression which wasn’t caused by technology obviously, unemployment has been between five and nine percent without fail. And yet, we’ve had incredibly disruptive technologies, like the mechanization of industry, the replacement of animal power with human power, electrification, and so forth. And in every case, humans have used those technologies to increase their own productivity and therefore their incomes. And that is the entire story of the rising standard of living for everybody, at least in the western world.
But I would be remiss not to make the other case, which is that there might be a point, an escape velocity, where a machine can learn a new job faster than a human. And at that point, at that magic moment, every new job, everything we create, a machine would learn it faster than a human. Such that, literally, everything from Michael Crichton down to…everybody—everybody finds themselves replaced. Is that possible? And if that really happened, would that be a bad thing?
That’s a very good question I think for society in general. Maybe because my day-to-day is about identifying what are the current challenges in making progress in AI, I see—and I guess we’ve touched that a little bit earlier—that there are still many scientific challenges, that it doesn’t seem like it’s just a matter of making computers faster and collecting more data. Because I see these many challenges, and because I’ve seen that the scientific community, in previous years, has been wrong and has been overly optimistic, I tend to err on the side of less gloomy and a bit more conservative in how quickly we’ll get there, if we ever get there.
In terms of what it means for society—if that was to ever happen that we can automate essentially most things—I unfortunately feel ill-equipped as a non-economist to be able to really have a meaningful opinion about this. But I do think it’s good that we have a dialog about it, as long as it’s grounded in facts. Which is why it’s a difficult question to discuss, because we’re talking about a hypothetical future that might not exist before a very long time. But as long as we have, otherwise, a rational discussion about what might happen, I don’t see a reason not to have that discussion.
It’s funny. Probably the truest thing that I’ve learned from doing all of these chats is that there is a direct correlation between how much you code and how far away you think an AGI is.
That’s quite possible.
I could even go further to say that the longer you have coded, the further away you think it is. People who are new at it are like, “Yeah. We’ll knock this out.” And the other people who think it’s going to happen really quickly are more observers. So, I want to throw a thought experiment to you.
It’s a thought experiment that I haven’t presented to anybody on the show yet. It’s by a man named Frank Jackson, and it’s the problem of Mary, and the problem goes like this. There’s this hypothetical person, Mary, and Mary knows everything in the world about color. Everything is an understatement. She has a god-like understanding of color, everything down to the basic, most minute detail of light and neurons and everything. And the rub is that she lives in a room that she’s never left, and everything she’s seen is black and white. And one day she goes outside and she sees red for the first time. And the question is, does she learn anything new when that happens that she didn’t know before? Do you have an initial reaction to that?
My initial reaction is that, being colorblind I might be ill-equipped to answer that question. But seriously, so she has a perfect understanding of color but—just restating the situation—she has only seen in black and white?
Correct. And then one day she sees color. Did she learn anything new about color?
By definition of what understanding means, I would think that she wouldn’t learn anything about color. About red specifically.
Right. That is probably the consistent answer, but it’s one that is intuitively unsatisfying to many people. The question it’s trying to get at is, is experiencing something different than knowing something? And if in fact it is different, then we have to build a machine that can experience things for it to truly be intelligent, as opposed to just knowing something. And to experience things means you return to this thorny issue of consciousness. We are not only the most intelligent creature on the planet, but we’re arguably the most conscious. And that those two things somehow are tied together. And I just keep returning to that because it implies, maybe, you can write all the code in the world, and until the machine can experience something… But the way you just answered the question was, no, if you know everything, experiencing adds nothing.
I guess, unless that experience would somehow contradict what you know about the world, I would think that it wouldn’t affect it. And this is partly, I think, one challenge about developing AI as we move forward. A lot of the AIs that we’ve successfully developed that have to do with performing a series of actions, like playing Go for instance, have really been developed in a simulated environment. In this case, for a board game, it’s pretty easy to simulate it on a computer because you can literally write all the rules of the game so you can put them in the computer and simulate it.
But, for an experience such as being in the real world and manipulating objects, as long as that simulated experience isn’t exactly what the experience is in the real world, touching real objects, I think we will face a challenge in transferring any kind of intelligence that we grow in simulations, and transfer it to the real world. And this partly relates to our inability to have algorithms that learn rapidly. Instead, they require millions of repetitions or examples to really be close to what humans can do. Imagine having a robot go through millions of labeled examples from someone manipulating that robot, and showing it exactly how to do everything. That robot might essentially learn too slowly to really learn any meaningful behavior in a reasonable amount of time.
You used the word transfer three or four times there. Do you think that transfer learning, this idea that humans are really good at taking what we know in one domain space and applying it in another—you know, you walk around one big city and go to a different big city and you kind of map things. Is that a useful thing to work on in artificial intelligence?
Absolutely. In fact, we’re seeing that with all the success that has been enabled by the ImageNet data set and the competition. It turns out if you train an object recognition system on this large ImageNet data set, it really is responsible for the revolution of deep neural nets and convolutional neural nets in the field of computer vision. It turns out that these models trained on that source of data could transfer really well to a surprising number of paths. And that has very much enabled a kind of a revolution in computer vision. But it’s a fairly simple type of transfer, and I think there are more subtle ways of transferring, where you need to take what you knew before but slightly adjust it. How to do to that without forgetting what you learned before? So, understanding how these different mechanisms need to work together to be able to perform a form of lifelong learning, of being able to accumulate one task after another, and learning each new task with less and less experience, is something I think currently we’re not doing as well as we need to.
What keeps you up at night? You meet a genie and you rub the bottle and the genie comes out and says, “I will give you perfect understanding of something.” What do you wrestle with that maybe you can phrase in a way that would be useful to the listeners?
Let’s see. That’s a very good question. Definitely, in my daily research, how are we able to accumulate knowledge, and how would a machine accumulate knowledge, in a very long period, and learn the sequence of tasks and abilities in a sequence, cumulatively, is something that I think a whole lot about. And this has led me to think about learning to learn, because I suspect that there are ideas. And effectively once you have to learn one ability after the other after the other, that process of doing this and doing it better, the fact that we do it better is, perhaps, because we are learning how to learn each task also. That there’s this other scale of learning that is going on. How to do this exactly I don’t quite know, and knowing this I think would be a pretty big step in our field.
I have three final questions, if I could. You’re in Canada, correct?
As it turns out, I’m currently still in the US because I have four kids, two of them are in school so I wanted them to finish their school year before we move. But the plan is for me to go to Montreal, yes.
I noticed something. There’s a lot of AI activity in Canada, a lot of leading research. How did that come about? Was that a deliberate decision or just a kind of a coincidence that different universities and businesses decided to go into that?
If I speak for Montreal specifically, very clearly at the source of it is Yoshua Bengio deciding to stay in Montreal, staying in academia, and then continuing to train many students, gathering other researchers that are also in his group, and also training more PhDs in the field that doesn’t have as much talent as is needed. I think this is essentially the source of it.
And then my second to the last question is, what about science fiction? Do you enjoy it in any form, like movies or TV or books or anything like that? And if so, is there any that you look at it and think, “Ah, the future could happen that way”?
I definitely used to be more into science fiction. Now maybe due to having kids I watch many more Disney movies than I watch science fiction. It’s actually a good question. I’m realizing I haven’t watched a sci-fi movie for a bit, but it would be interesting, now that I’ve actually been in this field for a while, to sort of confront my vision of it from how artists potentially see AI. Maybe not seriously. A lot of art is essentially philosophy around what could happen, or at least projecting a potential future and seeing how we feel about it. And for that purpose, I’m now tempted to revisit either some classics or seeing what are recent sci-fi movies.
I said only one more question, so I’ve got to combine two into one to stick with that. What are you working on, and if a listener is going into college or is presently in college and wants to get into artificial intelligence in a way that is really relevant, what would be a leading edge that you would say somebody entering the field now would do well to invest time in? So first, you, and then what would you recommend for the next generation of AI researchers?
As I’ve mentioned, perhaps not so surprisingly, I am very much interested in learning to learn and meta learning. I’ve started publishing on the subject, and I’m still very much thinking around various new ideas for meta learning approaches. And also learning from, yes, weaker signals than in the supervised learning setting. Such as learning from worded feedback from a person is something I haven’t quite started working on specifically, but I’m thinking a whole lot about these days. Perhaps those are directions that I would definitely encourage other young researchers to think about and study and research.
And in terms of advice, well, I’m obviously biased, and being in Montreal studying deep learning and AI, currently, is a very, very rich and great experience. There are a lot of people to talk to, to interact with, not just in academia but now much more in industry, such as ourselves with Google and other places. And also, being very active online. On Twitter, there’s now a very, very rich community of people sharing the work of others and discussing the latest results. The field is moving very fast, and in large part it’s because the deep learning community has been very open about sharing its latest results, and also making the discussion open about what’s going on. So be connected, whether it be on Twitter or other social networks, and read papers and look at what comes up on archives—engage in the global conversation.
Alright. Well that’s a great place to end. I want to thank you so much. This has been a fascinating hour, and I would love to have you come back and talk about your other work in the future if you’d be up for it.
Of course, yeah. Thank you for having me.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.

Voices in AI – Episode 4: A Conversation with Jeff Dean

In this episode, Byron and Jeff talk about AGI, machine learning, and healthcare.
[podcast_player name=”Episode 4: A Conversation with Jeff Dean” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2017-09-28-(00-31-10)-jeff-dean.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2017/09/voices-headshot-card-3.jpg”]
Byron Reese: Hello, this is Voices in AI brought to you by Gigaom. I am your host, Byron Reese. Today we welcome Jeff Dean onto the show. Jeff is a Google Senior Fellow and he leads the Google Brain project. His work probably touches my life, and maybe yours, about every hour of every day, so I can’t wait to begin the conversation. Welcome to the show Jeff. 
Jeff Dean: Hi Byron, this is Jeff Dean. How are you?
I’m really good, Jeff, thanks for taking the time to chat. You went to work for Google, I believe, in the second millennium. Is that true?
Yes, I did, in 1999.
So the company wasn’t even a year old at that time.
That’s right, yeah it was pretty small. We were all kind of wedged in the second-floor office area, above what is now a T-Mobile store in downtown Palo Alto.
And did it feel like a start-up back then, you know? All the normal trappings that you would associate with one?
We had a ping pong table, I guess. That also doubled as where we served food for lunch. I don’t know—yeah, it felt exciting and vibrant, and we were trying to build a search engine that people would want to use. And so there was a lot of work in that area, which is exciting.
And so, over the last seventeen years… Just touch on, it’s an amazing list of the various things you’ve worked on.
Sure. The first thing I did was put together the initial skeleton of what became our advertising system, and I worked on that for a little while. Then mostly for the next four or five years I spent my time with a handful of other people working on our core search system. That’s everything from the calling system—when it goes out and fetches all the pages on the web that we can get our hands on—to the indexing system that then turns that into a system that we can actually query quickly when users are asking a question.
They type something into Google, and we want to be able to very quickly analyze what pages are going to be relevant to that query, and return the results we return today. And then the serving system that, when a query comes into Google, decides how to distribute that request over lots and lots of computers to have them farm that work out and then combine the results of their individual analyses into something that we can then return back to the user.
And that was kind of a pretty long stretch of time, where I worked on the core search and indexing system.
And now you lead the Google Brain project. What is that?
Right. So, it’s basically we have a fairly large research effort around doing machine learning and artificial intelligence research, and then using the results of our research to make intelligent systems. Where an intelligent system may be something that goes into a product, it might be something that enables new kinds of products, it might be, you know, some combination of that.
When we’re working with getting things into existing products, we often collaborate closely with different Google product teams to get the results of our work out into products. And then we also do a lot of research that is sort of pure research, untied to any particular products. It’s just something that we think will advance the capabilities of the kinds of systems we’re able to build, and ultimately will be useful even if they don’t have a particular application in mind at the moment.
“Artificial intelligence” is that phrase that everybody kind of disowns, but what does it mean to you? What is AI? When you think about it, what is it? How would you define it in simple English?
Right, so it’s a term that’s been around since the very beginning of computing. And to me it means essentially trying to build something that appears intelligent. So, the way we distinguish humans from other organisms is that we have these higher-level intelligence capabilities. We can communicate, we can absorb information, and understand it at a very high level.
We can imagine the consequences of doing different things as we decide how we’re going to behave in the world. And so we want to build systems that embody as many aspects of intelligence as we can. And sometimes those aspects are narrowly defined, like we want them to be able to do a particular task that we think is important, and requires a narrow intelligence.
But we also want to build systems that are flexible in their intelligence, and can do many different things. I think the narrow intelligence aspects are working pretty well in some areas today. The broad, really flexible intelligence is clearly an open research problem, and it’s going to consume people for a long time—to actually figure out how to build systems that can behave intelligently across a huge range of conditions.
It’s interesting that you emphasize “behave intelligently” or “appear intelligent.” So, you think artificial intelligence, like artificial turf, isn’t really turf—so the system isn’t really intelligent, it is emulating intelligence. Would you agree with that?
I mean, I would say it exhibits many of the same characteristics that we think of when we think of intelligence. It may be doing things differently, because I think you know biology and silicon have very different strengths and weaknesses, but ultimately what you care about is, “Can this system or agent operate in a manner that is useful and can augment what human intelligence can do?”
You mentioned AGI, an artificial general intelligence. The range of estimates on when we would get such a technology are somewhere between five and five hundred years. Why do you think there’s such a disparity in what people think?
I think there’s a huge range there because there’s a lot of uncertainty about what we actually need. We don’t quite know how humans process all the different kinds of information that they receive, and formulate strategies. We have some understanding of that, but we don’t have deep understanding of that, and so that means we don’t really know the scope of work that we need to do to build systems that exhibit similar behaviors.
And that leads to these wildly varying estimates. You know, some people think it’s right around the corner, some think it’s nearly impossible. I’m kind of somewhere in the middle. I think we’ve made a lot of progress in the last five or ten years, building on stuff that was done in the twenty or thirty years before that. And I think we will have systems that exhibit pretty broad kinds of intelligence, maybe in the next twenty or thirty years, but I have high error bars on those estimates.
And the way you describe that, it sounds like you think an AGI is an evolution from the work that we’re doing now, as opposed to it being something completely different we don’t even know. You know, we haven’t really started working on the AGI problem. Would you agree with that or not?
I think some of what we’re doing is starting to touch on the kind of work that we’ll need to build artificial general intelligence systems. I think we have a huge set of things that we don’t know how to solve yet, and that we don’t even know that we need yet, which is why this is an open and exciting research problem. But I do think some of the stuff we’re doing today will be part of the solution.
So you think you’ll live to see an AGI, while you’re still kind of in your prime?
Ah well, the future is unpredictable. I could have a bike accident tomorrow or something, but I think if you look out fifteen or twenty years, there will be things that are not really imaginable, that we don’t have today, that will do impressive things ten, fifteen, twenty years down the run.
Would that put us on our way to an AGI being conscious, or is machine consciousness a completely different thing which may or may not be possible?
I don’t really know. I tend not to get into the philosophical debates of what is consciousness. To my untrained neuroscience eye, consciousness is really just a certain kind of electrical activity in the neurons in a living system—that it can be aware of itself, that it can understand consequences, and so on. And so, from that standpoint consciousness doesn’t seem like a uniquely special thing. It seems like a property that is similar to other properties that intelligent systems exhibit.
So, absent your bicycle crash, what would that world look like, a world twenty years from now where we’ve made incredible strides in what AI can do, and maybe have something that is close to being an AGI? How do you think that plays out in the world? Is that good for humanity?
I think it will almost uniformly be good. I think if you look at technological improvements in the past—major things like the shift from an agrarian society to one that the Industrial Revolution fueled, which allowed what used to be ninety-nine percent of people working to grow food now, is now a few percent of people in many countries working on producing food supply. And that has freed up people to do many, many other things, all the other things that we see in our society, as a result of that big shift.
So, I think like any technology, there can be uses for it that are not so great, but by-and-large the vast set of things that happen will be improvements. I think the way to view this is, a really intelligent sidekick is something that would really improve humanity.
If I have a question, a very complicated thing—that today I can do via search engine, if I sit down for nine hours or ten hours and really think through and say, “I really want to learn about a particular topic, so I need to find all these papers and then read them and summarize them myself.” If I had an intelligent system that could do that for me, and I could say, “Find me all the papers on reinforcement learning for robotics and summarize them.” And the system could go back, and in twenty seconds do that, that would be hugely useful for humanity.
Oh absolutely. So, what are some of the challenges that you think separate us from that world? Like what are the next obstacles we need to overcome in the field?
One of the things that I think is really important today in the field of machine learning research, that we’ll need to overcome, is… Right now, when we want to build a machine learning system for a particular task we tend to have a human machine learning expert involved in that. So, we have some data, we have some computation capability, and then we have a human machine learning expert sit down and decide: Okay, we want to solve this problem, this is the way we’re going to go about it roughly. And then we have the system that can learn from observations that are provided to it, how to accomplish that task.
That’s sort of what generally works, and that’s driving a huge number of really interesting things in the world today. And you know this is why computer vision has made such great strides in the last five years. This is why speech recognition works much better. This is why machine translation now works much, much better than it did a year or two ago. So that’s hugely important.
But the problem with that is you’re building these narrowly defined systems that can do one thing and do it extremely well, or do a handful of things. And what we really want is a system that can do a hundred thousand things, and then when the hundred thousand-and-first thing comes along that it’s never seen before, we want it to learn from its experience to be able to apply the experience it’s gotten in solving the first hundred thousand things to be able to quickly learn how to do thing hundred thousand-and-one.
And that kind of meta learning, you want that to happen without a human machine learning expert in the loop to teach it how to do the hundred thousand-and-first thing.
And that might actually be your AGI at that point, right?  
I mean it will start to look more like a system that can improve on itself over time, and can add the ability to do new novel tasks by building on what it already knows how to do.
Broadly speaking, that’s transferred learning, right? Where we take something in one space and use that to influence the other one. Is that a new area of study, or is that something that people have thought about for a long time, and we just haven’t gotten around to building a bunch of—
People have thought about that for quite a while, but usually in the context of, I have a few tasks that I want to do, and I’m going to learn to do three of them. And then, use the results of learning to do three, to do the fourth better with less data, maybe. Not so much at the scale of a million tasks… And then completely new ones come along, and without any sort of human involvement, the system can pick up and learn to do that new task.
So I think that’s the main difference. Multitask learning and transfer learning have been done with some success at very small scale, and we need to make it so that we can apply them at very large scales.
And the other thing that’s new is this meta learning work, that is starting to emerge as an important area of machine learning research—essentially learning to learn. And that’s where you’ll be able to have a system that can see a completely novel task and learn to accomplish it based on its experience, and maybe experiments that it conducts itself about what approaches it might want to try to solve this new task.
And that is currently where we have a human in the loop, to try different approaches and where we think this ‘learning to learn’ research is going to make faster progress.
There are those who worry that the advances in artificial intelligence will have implications for human jobs. That eventually machines can learn new tasks faster than a human can, and then there’s a group of people who are economically locked out of the productive economy. What are your thoughts on that?
So, I mean I think it’s very clear that computers are going to be able to automate some aspects of some kinds of jobs, and that those jobs—the things they’re going to be able to automate—are a growing set over time. And that has happened before, like the shift from agrarian societies to an industrial-based economy happened largely because we were able to automate a lot of the aspects of farm production, and that caused job displacement.
But people found other things to do. And so, I’m a bit of an optimist in general and I think, you know, politicians and policymakers should be thinking about what the society structures we want to have in place should be if computers can suddenly do a lot more things than they used to be able to. But I think that’s of largely a governmental and policy set of issues.
My view is, a lot of the things that computers will be able to automate are these kinds of repetitive tasks that humans currently do because they’re too complicated for our computers to learn how to do.
So am I reading you correctly, that you’re not worried about a large number of workers displaced from their jobs, from the technology?
Well I definitely think that there will be some job displacement, and it’s going to be uneven. Certain kinds of jobs are going to be much more amenable to automation than others. The way I like to think about it is, if you look at the set of things that a person does in their job, if it’s a handful of things that are all repetitive, that’s something that’s more likely to be automatable, than someone whose job involves a thousand different things every day, and you come in tomorrow and your job is pretty different from what you did today.
And within that, what are the things that you’re working on—on a regular basis—in AI right now?
Our group as a whole does a lot of different things, and so I’m leading our group to help provide direction for some of the things we’re doing. Some of the things we’re working on within our group that I’m personally involved in are use of machine learning for various healthcare related problems. I think machine learning has a real opportunity to make a significant difference in how healthcare is provided.
And then I’m personally working on how can we actually build the right kinds of computer hardware and computer software systems that enable us to build machine learning systems which can successfully try out lots of different machine learning ideas quickly—so that you can build machine learning systems that can scale.
So that’s everything from, working with our hardware design team to make sure we build the right kind of machine learning hardware. TensorFlow is an open source package that our group has produced—that we open-sourced about a year and a half ago—that is how we express our machine learning research ideas, and use it for training machine learning systems for our products. And we’ve now released it, so lots of people outside Google are using this system as well, and working collaboratively to improve it over time.
And then we have a number of different kinds of research efforts, and I’m personally following pretty closely our “learning to learn” efforts, because I think that’s going to be a pretty important area.
Many people believe that if we build an AGI, it will come out of a Google. Is that a possibility?
Well, I think there’s enough unknowns in what we need to do that it could come from anywhere. I think we have a fairly broad research effort because we think this is, you know, a pretty important field to push forward, and we certainly are working on building systems that can do more and more. But AGI is a pretty long-term goal, I would say.
It isn’t inconceivable that Google itself reaches some size where it takes on some emergent properties which are well, I guess, by their definition unforeseeable?
I don’t quite know what that means, I guess.
People are emergent, right? You’re a trillion cells that don’t know who you are, but collectively… You know none of your cells have a sense of humor, but you do. And so at some level the entire system itself acquires characteristics that no parts of it have. I don’t mean it in any ominous way. Just to say that it’s when you start looking at numbers, like the number of connections in the human brain and what not, that we start seeing things of the same sort of orders in the digital world. It just invites one to speculate.
Yeah, I think we’re still a few orders of magnitude off in terms of where a single human brain is, versus what the capabilities of computing systems are. We’re maybe at like newt or something. But, yes, I mean presumably the goal is to build more intelligent systems, and as you add more computational capability, those systems will get more capable.
Is it fair to say that the reason we’ve had such a surge in success with AI in the last decade is this, kind of, perfect storm of GPUs, plus better algorithms, plus better data collection—so better training sets, plus Moore’s Law at your back? Is it nothing more complicated than that? That there have just been a number of factors that have come together? Or did something happen, some watershed event that maybe passed unnoticed, that gave us this AI Renaissance that were in now?
So, let me frame it like this: A lot of the algorithms that we’re using today were actually developed twenty, twenty-five years ago during the first upsurge in interest in neural networks, which is a particular kind of machine learning model. One that’s working extremely well today, but twenty or twenty-five years ago showed interesting signs of life on a very small problem… But we lacked the computational capabilities to make them work well on large problems.
So, if you fast-forward twenty years to maybe 2007, 2008, 2009, we started to have enough computational ability, and data sets that were big enough and interesting enough, to make neural networks work on practical interesting problems—things like computer vision problems or speech recognition problems.
And what’s happened is neural networks have become the best way to solve many of these problems, because we now have enough computational ability and big enough data sets. And we’ve done a bunch of work in the last decade, as well, to augment the sort of foundational algorithms that were developed twenty, thirty years ago with new techniques and all of that.
GPUs are one interesting aspect of that, but I think the fundamental thing is the realization that neural nets in particular, and these machine learning models, really have different computational characteristics than most code you run today on computers. And those characteristics are that they essentially mostly do linear algebra kinds of operations—matrix multiply vector operations—and that they are also fairly tolerant of reduced precision. So you don’t need six or seven digits of precision when you’re doing the computations for a neural net—you need many fewer digits of precision.
Those two factors together allow you to build specialized kinds of hardware for very low-precision linear algebra. And that’s what’s kind of augmented the ability of us to apply more computation to some of these problems. GPUs being one thing, Google has developed a new kind of custom chip called the Tensor processing unit, a TPU, that uses lower-precision than GPUs and offers significant performance advantages, for example. And I think this is an interesting and exploding area. Because when building specialized hardware that’s tailored to a subset of things, as opposed to very general kinds of computations like a CPU does, you run the risk that that specialized subset is only a little bit of what you want to do in a computing system.
But the thing that neural nets and machine learning models have today is that they’re applicable to a really broad range of things. Speech recognition and translation and computer vision and medicine and robotics—all these things can use that same underlying set of primitives, you know, accelerated linear algebra to do vastly different things. So you can build specialized hardware that applies to a lot of different things.
I got you. Alright, well I think we’re at time. Do you have any closing remarks, or any tantalizing things we might look forward to coming out of your work?
Well, I’m very excited about a lot of different things. I’ll just name a few…
So, I think the use of machine learning for medicine and healthcare is going to be really important. It’s going to be a huge aid to physicians and other healthcare workers to be able to give them quick second opinions about what kinds of things might make sense for patients, or to interpret a medical image and give people advice about what kinds of things they should focus on in a medical image.
I’m very excited about robotics. I think machine learning for robotics is going to be an interesting and emerging field in the next five years, ten years. And I think this “learning to learn” work will lead to more flexible systems which can learn to do new things without requiring as much machine learning expertise. I think that’s going to be pretty interesting to watch, as that evolves.
Then, beneath all the machine learning work, this trend toward building customized hardware that is tailored to particular kinds of machine learning models is going to be an interesting one to watch over the next five years, I think.
Alright, well…
One final thought, I guess, is that I think the field of machine learning has the ability to touch not just computer science but lots and lots of fields of human endeavor. And so, I think that it’s a really exciting time as people realize this and want to enter the field, and start to study and do machine learning research, and understand the implications of machine learning for different fields of science or different kinds of application areas.
And so that’s been really exciting to see over the last five or eight years, is more and more people from all different kinds of backgrounds are entering the field and doing really interesting, cool new work in this field.
Excellent. Well I want to thank you for taking the time today. It has been a fantastically interesting hour.
Okay thanks very much. Appreciate it.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here

How PayPal uses deep learning and detective work to fight fraud

Hui Wang has seen the nature of online fraud change a lot in the 11 years she’s been at PayPal. In fact, a continuous evolution of methods is kind of the nature of cybercrime. As the good guys catch onto one approach, the bad guys try to avoid detection by using another.

Today, said Wang, PayPal’s senior director of global risk sciences, “The fraudsters we’re interacting with are… very unique and very innovative. …Our fraud problem is a lot more complex than anyone can think of.”

In deep learning, though, Wang and her team might have found a way to help level the playing field between PayPal and criminals who want exploit the online payment platform.

Deep learning is a somewhat new approach to machine learning and artificial intelligence that has caught fire over the past few years thanks to companies such as [company]Google[/company], [company]Facebook[/company], [company]Microsoft[/company] and Baidu, and a handful of prominent researchers (some of whom now work for those companies). The field draws a lot of comparisons to the workings of the human brain because deep learning systems use artificial neural network algorithms, although “inspired by the brain” might be a more accurate description than “modeled after the brain.”

How DeepFace sees Calista Flockhart. Source: Facebook

A visual diagram of a deep neural network for facial recognition. Source: Facebook

Essentially, the stacks of neural networks that comprise deep learning models are very good at recognizing patterns and features of the data they’re trained on, which has led to some huge advances in computer vision, speech recognition, text analysis, machine listening and even video-game playing in the past few years. You can learn more about the field at our Structure Data conference later this month, which includes deep learning and artificial intelligence experts from Facebook, Microsoft, Yahoo, Enlitic and other companies.

It turns out deep learning models are also good at identifying the complex patterns and characteristics of cybercrime and online fraud. Machine-learning-based pattern recognition has long been a major part of fraud detection practices, but Wang said PayPal has seen a “major leap forward” in its abilities since it began investigating precursor (what she calls “non-linear”) techniques to deep learning several years ago. PayPal has been working with deep learning itself for the past two or three years, she said.

Some of these efforts are already running in production as part of the company’s anti-fraud systems, often in conjunction with human experts in what Wang describes as a “detective-like methodology.” The deep learning algorithms are able to analyze potentially tens of thousands of latent features (time signals, actors and geographic location are some easy examples) that might make up a particular type of fraud, and are even able to detect “sub modus operandi,” or different variants of the same scheme, she said.

Some of PayPal's fraud-management options for developers.

Some of PayPal’s fraud-management options for developers.

The patterns are much more complex than “If someone does X, then the result is Y,” so it takes artificial intelligence to analyze them at a level much deeper than humans can. “Actually,” Wang said, “that’s the beauty of deep learning.”

Once the models detect possible fraud, human “detectives” can get to work assessing what’s real, what’s not and what to do next.

PayPal uses a champions-and-challengers approach to deciding which fraud-detection models to rely on most heavily, and deep learning is very close to becoming the champion. “We’ve seen roughly a 10 percent delta on top of today’s champion,” Wang said, which is very significant.

And as the fraudulent behavior on PayPal’s platform continues to grow more complex, she’s hopeful deep learning will give her team the ability to adapt to these new patterns faster than before. It’s possible, for example, that PayPal might some day be able to deploy models that take live data from its system and become smarter, by retraining themselves, in real time.

“We’re doing that to a certain degree,” Wang said, “but I think there’s still more to be done.”

IBM acquires deep learning startup AlchemyAPI

So much for AlchemyAPI CEO Elliot Turner’s statement that his company is not for sale. IBM has bought the Denver-based deep learning startup that delivers a wide variety of text analysis and image recognition capabilities via API.

IBM plans to integrate AlchemyAPI’s technology into the core Watson cognitive computing platform. IBM will also use AlchemyAPI’s technology to expand its set of Watson cloud APIs that let developers infuse their web and mobile applications with artificial intelligence. Eventually, the AlchemyAPI service will shut down as the capabilities are folded into the IBM Bluemix platform, said IBM Watson Group vice president and CMO Stephen Gold said.

Elliot Turner — CEO, AlchemyAPI; Stephen Gold, Watson Solutions, IBM Software Group. Structure Data 2014

Love at first sight? AlchemyAPI CEO Elliot Turner (left) and IBM Watson vice president Stephen Gold (center) at Structure Data 2014.

Compared with Watson’s primary ability to draw connections and learn from analyzing textual data, AlchemyAPI excels at analyzing text for sentiment, category and keywords, and for recognizing objects and faces in images. Gold called the two platforms “a leather shoe fit” in terms of how well they complement each other. Apart from the APIs, he said AlchemyAPI’s expertise in unsupervised and semi-supervised learning systems (that is, little human oversight over model creation) will be a good addition to the IBM team.

We will discuss the burgeoning field of new artificial intelligence applications at our Structure Data conference later this month in New York, as well as at our inaugural Structure Intelligence event in September.

I have written before that cloud computing will be the key to IBM deriving the types of profits it wants to from Watson, as cloud developers are the new growth area for technology vendors. Cloud developers might not result in multi-million-dollar deals, but they represent a huge user base in aggregate and, more importantly, can demonstrate the capabilities of a platform like Watson probably better than IBM itself can. AlchemyAPI already has more than 40,000 developers on its platform.

Other companies delivering some degree of artificial intelligence and deep learning via the cloud, and sometimes via API, include Microsoft, Google, MetaMind, Clarifai and Expect Labs.

celebrity_chadsmith_willferrell_cropped (1)

AlchemyAPI’s facial recognition API can distinguish between Will Ferrell and Red Hot Chili Peppers drummer Chad Smith.

AlchemyAPI’s Turner said his company decided to join IBM, after spurning numerous acquisition offers and stating it wasn’t for sale, in part because it represents an opportunity to “throw rocket fuel on” the company’s long-term goals. Had the plan been to buy AlchemyAPI, kill its service and fold the team into research roles — like what happens with so many other acquisitions of deep learning talent — it probably would not have happened.

Gold added that IBM is not only keeping the AlchemyAPI services alive (albeit as part of the Bluemix platform) but also plans to use the company’s Denver headquarters as the starting point of an AI and deep learning hub in the city.

[protected-iframe id=”447ad6f774bfa076438dfe73b2d084db-14960843-6578147″ info=”https://www.youtube.com/embed/iHVeoJBtoIM?list=PLZdSb0bQCA7mpVy–2jQBxfjcbNrp9Fu4″ width=”640″ height=”390″ frameborder=”0″ allowfullscreen=””]

Update: This post was updated at 9:10 a.m. to include quotes and information from Elliot Turner and Stephen Gold.

A look at Zeroth, Qualcomm’s effort to put AI in your smartphone

What if your smartphone camera were smart enough to identify that the plate of clams and black beans appearing in its lens was actually food? What if it then automatically could make the necessary adjustments to take a decent picture of said dish in the low light conditions of a restaurant? And what if it then without prompting, uploaded that photo to Foodspotting along with your location because, your camera phone knows from past experience you like to keep an endless record of your culinary conquests for the world to see?

These are just a few of the questions that [company]Qualcomm[/company] is asking of its new cognitive computing technology Zeroth, which aims to bring artificial intelligence out of the cloud and move it – or at least a limited version of it – into your phone. At Mobile World Congress in Barcelona, I sat down with Qualcomm SVP of product management Raj Talluri, who explained what Zeroth was all about.

Zeroth phones aren’t going to beat chess Grand Masters or create its own unique culinary recipes, but it will perform basic intuitive tasks and anticipate your actions, thus eliminating many of the rudimentary steps required to operate the increasingly complex smartphone, Talluri explained.

“We wanted to see if we could build deep-learning neural networks on devices you carry with you instead of in the cloud,” Talluri said. Using that approach, Qualcomm could solve certain problems surrounding the everyday use of a device.

One such problem, Talluri called the camera problem. The typical smartphone can pick up a lot of images throughout the day, from selfies to landscape shots to receipts for your expense reports. You could load every image you have into the cloud and sort them there or figure out what to do with each photo as you snap them, but cognitive computing capabilities in your phone could do much of that work and it could it could it without you telling it what to do, Talluri said.

Zeroth can train the camera not just to recognize a landscape shot from a close up. It could determine between whole classes of objects, from fruit to mountains to buildings. It can distinguish children from adults and cats from dogs, Talluri said. What the camera does with that information depends on the user’s preferences and the application.

The most basic use case would be taking better photos as it can optimize the shot for the types of objects in them. It could also populate photos with tons of useful metadata. Then you could build on that foundation with other applications. Your smartphone might recognize, for instance, that you’re taking a bunch of landscape and architecture shots in foreign locale and automatically upload them to a vacation album on Flickr. A selfie might automatically produce a Facebook post prompt.

Zeroth devices would be pre-trained to recognize certain classes of objects – right now Qualcomm has used machine learning to create about 30 categories – but the devices could continue to learn after they’re shipped, Talluri said.

With permission, it could access your contact list and scan your social media accounts, and start recognizing the faces of your friends and family in your contact list, Talluri said. Then if you were taking a picture with a bunch of people in the frame, Zeroth would recognize your friends and focus in on their faces. Zeroth already has the ability to recognize handwriting, but you could train it to recognize the particular characteristics of your script, learning for instance that in my chicken scratch, lower case “A”s often look like “O”s.

Other examples of Zeroth applications include devices that could automatically adjust their power performance to the habits of its owner or scan its surroundings sensors to determine what a user’s most likely next smartphone action might be.

Zeroth itself isn’t a separate chip or component. It’s a software architecture designed to run across the different elements of Qualcomm’s Snapdragon processors, so as future Snapdragon products get more powerful, Zeroth becomes more intelligent, Talluri said. We’ll discuss the Zeroth capabilities and designing software that’s smarter and based on cognitive computing with a Qualcomm executive at our Structure Data event in New York later this month.

Qualcomm plans to debut the technology in next year’s premium smartphones and tablets that uses the forthcoming the Snapdragon 820, which uses a new 64-bit CPU architecture called Kyro and was announced at MWC. But Qualcomm was already showing off basic computer vision features like handwriting and object recognition on devices using the Snapdragon 810. Many of those devices were launched at MWC and should appear in markets in the coming months.


Google, Stanford say big data is key to deep learning for drug discovery

A team of researchers from Stanford University and Google have released a paper highlighting a deep learning approach they say shows promise in the field of drug discovery. What they found, essentially, is that that more data covering more biological processes seems like a good recipe for uncovering new drugs.

Importantly, the paper doesn’t claim a major breakthrough that will revolutionize the pharmaceutical industry today. It simply shows that by analyzing a whole lot of data across a whole lot of different target processes — in this case, 37.8 million data points across 259 tasks — seems to work measurably better for discovering possible drugs than does analyzing smaller datasets and/or building models specifically targeting a single a task. (Read the Google blog post for a higher-level, but still very-condensed explanation.)

But when talking about a process in drug discovery that can take years and cost drug companies billions of dollars that ultimately make their way into the prices of prescription drugs, any small improvement helps.

This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

Here’s how the researchers explain the reality, and the promise, of their work in the paper:

The efficacy of multitask learning is directly related to the availability of relevant data. Hence, obtaining greater amounts of data is of critical importance for improving the state of the art. Major pharmaceutical companies possess vast private stores of experimental measurements; our work provides a strong argument that increased data sharing could result in benefits for all.

More data will maximize the benefits achievable using current architectures, but in order for algorithmic progress to occur, it must be possible to judge the performance of proposed models against previous work. It is disappointing to note that all published applications of deep learning to virtual screening (that we are aware of) use distinct datasets that are not directly comparable. It remains to future research to establish standard datasets and performance metrics for this field.

. . .

Although deep learning offers interesting possibilities for virtual screening, the full drug discovery process remains immensely complicated. Can deep learning—coupled with large amounts of experimental data—trigger a revolution in this field? Considering the transformational effect that these methods have had on other fields, we are optimistic about the future.

If they’re right, we might look back on this research as part of a handful of efforts that helped spur an artificial intelligence revolution in the health care space. Aside from other research in the field, there are multiple startups, including Butterfly Network and Enlitic (which will be presenting at our Structure Data conference later this month in New York) trying to improve doctors’ ability to diagnose diseases using deep learning. Related efforts include the work IBM is doing with its Watson technology to analyze everything from cancer to PTSD, as well as from startups like Ayasdi and Lumiata.

There’s no reason that researchers have to stop here, either. Deep learning has proven remarkably good at tackling machine perception tasks such as computer vision and speech recognition, but the approach can technically excel at more general problems involving pattern recognition and feature selection. Given the right datasets, we could soon see deep learning networks identifying environmental factors and other root causes of disease that would help public health officials address certain issues so doctors don’t have to.

Why you can’t program intelligent robots, but you can train them

If it feels like we’re in the midst of robot renaissance right now, perhaps it’s because we are. There is a new crop of robots under development that we’ll soon be able to buy and install in our factories or interact with in our homes. And while they might look like robots past on the outside, their brains are actually much different.

Today’s robots aren’t rigid automatons built by a manufacturer solely to perform a single task faster than, cheaper than and, ideally, without much input from humans. Rather, today’s robots can be remarkably adaptable machines that not only learn from their experiences, but can even be designed to work hand in hand with human colleagues. Commercially available (or soon to be) technologies such as Jibo, Baxter and Amazon Echo are three well-known examples of what’s now possible, but they’re also just the beginning.

Different technological advances have spurred the development of smarter robots depending on where you look, although they all boil down to training. “It’s not that difficult to builtd the body of the robot,” said Eugene Izhikevich, founder and CEO of robotics startup Brain Corporation, “but the reason we don’t have that many robots in our homes taking care of us is it’s very difficult to program the robots.”

Essentially, we want robots that can perform more than one function, or perform one function very well. And it’s difficult to program a robot to do multiple things, or at least the things that users might want, and it’s especially difficult to program to do these things in different settings. My house is different than your house, my factory is different than your factory.

A collection of RoboBrain concepts.

A collection of RoboBrain concepts.

“The ability to handle variations is what enables these robots to go out into the world and actually be useful,” said Ashutosh Saxena, a Stanford University visiting professor and head of the RoboBrain project. (Saxena will be presenting on this topic at Gigaom’s Structure Data conference March 18 and 19 in New York, along with Julie Shah of MIT’s Interactive Robotics Group. Our Structure Intelligence conference, which focuses on the cutting edge in artificial intelligence, takes place in September in San Francisco.)

That’s where training comes into play. In some cases, particularly projects residing within universities and research centers, the internet has arguably been a driving force behind advances in creating robots that learn. That’s the case with RoboBrain, a collaboration among Stanford, Cornell and a few other universities that crawls the web with the goal of building a web-accessible knowledge graph for robots. RoboBrain’s researchers aren’t building robots, but rather a database of sorts (technically, more of a representation of concepts — what an egg looks like, how to make coffee or how to speak to humans, for example) that contains information robots might need in order to function within a home, factory or elsewhere.

RoboBrain encompasses a handful of different projects addressing different contexts and different types of knowledge, and the web provides an endless store of pictures, YouTube videos and other content that can teach RoboBrain what’s what and what’s possible. The “brain” is trained with examples of things it should recognize and tasks it should understand, as well as with reinforcement in the form of thumbs up and down when it posits a fact it has learned.

For example, one of its flagship projects, which Saxena started at Cornell, is called Tell Me Dave. In that project, researchers and crowdsourced helpers across the web train a robot to perform certain tasks by walking it through the necessary steps for tasks such as cooking ramen noodles.  In order for it to complete a task, the robot needs to know quite a bit: what each object it sees in the kitchen is, what functions it performs, how it operates and at which step it’s used in any given process. In the real world, it would need to be able to surface this knowledge upon, presumably, a user request spoken in natural language — “Make me ramen noodles.”

The Tell Me Dave workflow.

The Tell Me Dave workflow.

Multiply that by any number of tasks someone might actually want a robot to perform, and it’s easy to see why RoboBrain exists. Tell Me Dave can only learn so much, but because it’s accessing that collective knowledge base or “brain,” it should theoretically know things it hasn’t specifically trained on. Maybe how to paint a wall, for example, or that it should give human beings in the same room at least 18 inches of clearance.

There are now plenty of other examples of robots learning by example, often in lab environments or, in the case of some recent DARPA research using the aforementioned Baxter robot, watching YouTube videos about cooking (pictured above).

Advances in deep learning — the artificial intelligence technique du jour for machine-perception tasks such as computer vision, speech recognition and language understanding — also stand to expedite the training of robots. Deep learning algorithms trained on publicly available images, video and other media content can help robots recognize the objects they’re seeing or the words they’re hearing; Saxena said RoboBrain uses deep learning to train robots on proper techniques for moving and grasping objects.

The Brain Corporation platform.

The Brain Corporation platform.

However, there’s a different school of thought that says robots needn’t necessarily be as smart as RoboBrain wants to make them, so long as they can at least be trained to know right from wrong. That’s what Izhikevich and his aforementioned startup, Brain Corporation, are out to prove. It has built a specialized hardware and software platform, based on the idea of spiking neurons, that Izhikevich says can go inside any robot and “you can train your robot on different behaviors like you can train an animal.”

That is to say, for example, that a vacuum robot powered by the company’s operating system (called BrainOS) won’t be able to recognize that a cat is a cat, but it will be able to learn from its training that that object — whatever it is — is something to avoid while vacuuming. Conceivably, as long as they’re trained well enough on what’s normal in a given situation or what’s off limits, BrainOS-powered robots could be trained to follow certain objects or detect new objects or do lots of other things.

If there’s one big challenge to the notion of training robots versus just programming them, it’s that consumers or companies that use the robots will probably have to do a little work themselves. Izhikevich noted that the easiest model might be for BrainOS robots to be trained in the lab, and then have that knowledge turned into code that’s preinstalled in commercial versions. But if users want to personalize robots for their specific environments or uses, they’re probably going to have to train it.

Part of the training process with Canary. The next step is telling the camera what its seeing.

Part of the training process with Canary. The next step is telling the camera what it’s seeing.

As the internet of things and smart devices, in general, catch on, consumers are already getting used the idea — sometimes begrudgingly. Even when it’s something as simple as pressing a few buttons in an app, like training a Nest thermostat or a Canary security camera, training our devices can get tiresome. Even those of us who understand how the algorithms work can get get annoyed.

“For most applications, I don’t think consumers want to do anything,” Izhikevich said. “You want to press the ‘on’ button and the robot does everything autonomously.”

But maybe three years from now, by which time Izhikevich predicts robots powered by Brain Corporation’s platform will be commercially available, consumers will have accepted one inherent tradeoff in this new era of artificial intelligence — that smart machines are, to use Izhikevich’s comparison, kind of like animals. Specifically, dogs: They can all bark and lick, but turning them into seeing eye dogs or K-9 cops, much less Lassie, is going to take a little work.

Microsoft is building fast, low-power neural networks with FPGAs

Microsoft on Monday released a white paper explaining a current effort to run convolutional neural networks — the deep learning technique responsible for record-setting computer vision algorithms — on FPGAs rather than GPUs.

Microsoft claims that new FPGA designs provide greatly improved processing speed over earlier versions while consuming a fraction of the power of GPUs. This type of work could represent a big shift in deep learning if it catches on, because for the past few years the field has been largely centered around GPUs as the computing architecture of choice.

If there’s a major caveat to Microsoft’s efforts, it might have to do with performance. While Microsoft’s research shows FPGAs consuming about one-tenth the power of high-end GPUs (25W compared with 235W), GPUs still process images at a much higher rate. Nvidia’s Tesla K40 GPU can do between 500 and 824 images per second on one popular benchmark dataset, the white paper claims, while Microsoft predicts its preferred FPGA chip — the Altera Arria 10 — will be able to process about 233 images per second on the same dataset.

However, the paper’s authors note that performance per processor is relative because a multi-FPGA cluster could match a single GPU while still consuming much less power: “In the future, we anticipate further significant gains when mapping our design to newer FPGAs . . . and when combining a large number of FPGAs together to parallelize both evaluation and training.”

In a Microsoft Research blog post, processor architect Doug Burger wrote, “We expect great performance and efficiency gains from scaling our [convolutional neural network] engine to Arria 10, conservatively estimated at a throughput increase of 70% with comparable energy used.”


This is not Microsoft’s first rodeo when it comes deploying FPGAs within its data centers, and in fact is a corollary of an earlier project. Last summer, the company detailed a research project called Catapult in which it was able to improve the speed and performance of Bing’s search-ranking algorithms by adding FPGA co-processors to each server in a rack. The company intends to port production Bing workloads onto the Catapult architecture later this year.

There have also been other attempts to port deep learning algorithms onto FPGAs, including one by State University of New York at Stony Brook professors and another by Chinese search giant Baidu. Ironically, Baidu Chief Scientist, and deep learning expert, Andrew Ng is big proponent of GPUs, and the company claims a massive GPU-based deep learning system as well as a GPU-based supercomputer designed for computer vision. But this needn’t be and either/or situation: companies could still use GPUs to maximize performance while training their models, and then port them to FPGAs for production workloads.

Expect to hear more about the future of deep learning architectures and applications at Gigaom’s Structure Data conference March 18 and 19 in New York, which features experts from Facebook, Microsoft and elsewhere. Our Structure Intelligence conference, September 22-23 in San Francisco, will dive even deeper into deep learnings, as well as the broader field of artificial intelligence algorithms and applications.