Voices in AI – Episode 31: A Conversation with Tasha Nagamine

[voices_in_ai_byline]
In this episode, Byron and Tasha talk about speech recognition, AGI, consciousness, Droice Lab, healthcare, and science fiction.
[podcast_player name=”Episode 31 – A Conversation with Tasha Nagamine” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-01-22-(00-57-02)-tasha-nagamine.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2017/09/voices-in-ai-cover.png”]
[voices_in_ai_byline]
Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. Today our guest is Tasha Nagamine. She’s a PhD student at Columbia University, she holds an undergraduate degree from Brown and a Masters in Electrical Engineering from Columbia. Her research is in neural net processing in speech and language, then the potential applications of speech processing systems through, here’s the interesting part, biologically-inspired, deep neural network models. As if that weren’t enough to fill up a day, Tasha is also the CTO of Droice Labs, an AI healthcare company, which I’m sure we will chat about in a few minutes. Welcome to the show, Tasha.
Tasha Nagamine: Hi.
So, your specialty, it looks like, coming all the way up, is electrical engineering. How do you now find yourself in something which is often regarded as a computer science discipline, which is artificial intelligence and speech recognition?
Yeah, so it’s actually a bit of an interesting meandering journey, how I got here. My undergrad specialty was actually in physics, and when I decided to go to grad school, I was very interested, you know, I took a class and found myself very interested in neuroscience.
So, when I joined Columbia, the reason I’m actually in the electrical engineering department is that my advisor is an EE, but what my research and what my lab focuses on is really in neuroscience and computational neuroscience, as well as neural networks and machine learning. So, in that way, I think what we do is very cross-disciplinary, so that’s why the exact department, I guess, may be a bit misleading.
One of my best friends in college was a EE, and he said that every time he went over to like his grandmother’s house, she would try to get him to fix like the ceiling fan or something.  Have you ever had anybody assume you’re proficient with a screwdriver as well?
Yes, that actually happens to me quite frequently. I think I had one of my friends’ landlords one time, when I said I was doing electrical engineering, thought that that actually meant electrician, so was asking me if I knew how to fix light bulbs and things like that.
Well, let’s start now talking about your research, if you would. In your introduction, I stressed biologically-inspired deep neural networks. What do you think, do we study the brain and try to do what it does in machines, or are we inspired by it, or do we figure out what the brain’s doing and do something completely different? Like, why do you emphasize “biologically-inspired” DNNs?
That’s actually a good question, and I think the answer to that is that, you know, researchers and people doing machine learning all over the world actually do all of those things. So, the reason that I was stressing a biologically-inspired—well, you could argue that, first of all, all neural networks are in some way biologically-inspired; now, whether or not they are a good biologically-inspired model, is another question altogether—I think a lot of the big, sort of, advancements that come, like a convolutional neural network was modeled basically directly off of the visual system.
That being said, despite the fact that there are a lot of these biological inspirations, or sources of inspiration, for these models, there’s many ways in which these models actually fail to live up to the way that our brains actually work. So, by saying biologically-inspired, I really just mean a different kind of take on a neural network where we try to, basically, find something wrong with a network that, you know, perhaps a human can do a little bit more intelligently, and try to bring this into the artificial neural network.
Specifically, one issue with current neural networks is that, usually, unless you keep training them, they have no way to really change themselves, or adapt to new situations, but that’s not what happens with humans, right? We continuously take inputs, we learn, and we don’t even need supervised labels to do so. So one of the things that I was trying to do was to try to draw from this inspiration, to find a way to kind of learn in an unsupervised way, to improve your performance in a speech recognition task.
So just a minute ago, when you and I were chatting before we started recording, a siren came by where you are, and the interesting thing is, I could still understand everything you were saying, even though that siren was, arguably, as loud as you were. What’s going on there, am I subtracting out the siren? How do I still understand you? I ask this for the obvious reason that computers seem to really struggle with that, right?
Right, yeah. And actually how this works in the brain is a very open question and people don’t really know how it’s done. This is actually an active research area of some of my colleagues, and there’s a lot of different models that people have for how this works. And you know, it could be that there’s some sort of filter in your brain that, basically, sorts speech from the noise, for example, or a relevant signal from an irrelevant one. But how this happens, and exactly where this happens is pretty unknown.
But you’re right, that’s an interesting point you make, is that machines have a lot of trouble with this. And so that’s one of the inspirations behind these types of research. Because, currently, in machine learning, we don’t really know the best way to do this and so we tend to rely on large amounts of data, and large amounts of labeled data or parallel data, data corrupted with noise intentionally, however this is definitely not how our brain is doing it, but how that’s happening, I don’t think anyone really knows.
Let me ask you a different question along the same lines. I read these stories all the time that say that, “AI has approached human-quality in transcribing speech,” so I see that. And then I call my airline of choice, I will not name them, and it says, “What is your frequent flyer number?” You know, it’s got Caller ID, it should know that, but anyway. Mine, unfortunately, has an A, an H, and an 8 in it, so you can just imagine “AH8H888H”, right?
It never gets it. So, I have to get up, turn the fan off in my office, take my headset off, hold the phone out, and say it over and over again. So, two questions: what’s the disconnect between what I read and my daily experience? Actually, I’ll give you that question and then I have my follow up in a moment.
Oh, sure, so you’re saying, are you asking why it can’t recognize your—
But I still read these stories that say it can do as good of a job as a human.
Well, so usually—and, for example, I think, recently, there was a story published about Microsoft coming up with a system that had reached human parity in speech recognition—well, usually when you’re saying that, you have it on a somewhat artificial task. So, you’ll have a predefined data set, and then test the machine against humans, but that doesn’t necessarily correspond to a real-world setting, they’re not really doing speech recognition out in the wild.
And, I think, you have an even more difficult problem, because although it’s only frequent flyer numbers, you know, there’s no language model there, there’s no context for what your next number should be, so it’s very hard for that kind of system to self-correct, which is a bit problematic.
So I’m hearing two things. The first thing, it sounds like you’re saying, they’re all cooking the books, as it were. The story is saying something that I interpret one way that isn’t real, if you dig down deep, it’s different. But the other thing you seem to be saying is, even though there’s only thirty-six things I could be saying, because there’s no natural flow to that language, it can’t say, “oh, the first word he said was ‘the’ and the third word was ‘ran;’ was that middle word ‘boy’ or ‘toy’?” It could say, “Well, toys don’t run, but boys do, therefore it must be, ‘The boy ran.'” Is that what I’m hearing you saying, that a good AI system’s going to look contextually and get clues from the word usage in a way that a frequent flyer system doesn’t.
Right, yeah, exactly. I think this is actually one of the fundamental limitations of, at least, acoustic modeling, or, you know, the acoustic part of speech recognition, which is that you are completely limited by what the person has said. So, you know, maybe it could be that you’re not pronouncing your “t” at the end of “eight,” very emphatically. And the issue is that, there’s nothing you can really do to fix that without some sort of language-based information to fix it.
And then, to answer your first question, I wouldn’t necessarily call it “cooking the books,” but it is a fact that, you know, really the data that you have to train on and test on and to evaluate your metrics on, often, almost never really matches up with real-world data, and this is a huge problem in the speech domain, it’s a very well-known issue.
You take my 8, H, and A example—which you’re saying that’s a really tricky problem without context—and, let’s say, you have one hundred English speakers, but one is from Scotland, and one could be Australian, and one could be from the east coast, one could be from the south of the United States; is it possible that the range of how 8 is said in all those different places is so wide that it overlaps with how H is said in some places. So, in other words, it’s a literally insoluble problem.
It is, I would say it is possible. One of the issues is then you should have a separate model for different dialects. I don’t want to dive too far into the weeds with this, but at the root of a speech recognition system is often things like the fundamental linguistic or phonetic unit is a phoneme, which is the smallest speech sound, and people even argue about whether or not that these actually exist, what they actually mean, whether or not this is a good unit to use when modeling speech.
That being said, there’s a lot of research underway, for example, sequence to sequence models or other types of models that are actually trying to bypass this sort of issue. You know, instead of having all of these separate components modeling all of the acoustics separately, can we go directly from someone’s speech and from there exactly get text. And maybe through this unsupervised approach it’s possible to learn all these different things about dialects, and to try to inherently learn these things, but that is still a very open question, and currently those systems are not quite tractable yet.
I’m only going to ask one more question on these lines—though I could geek out on this stuff all day long, because I think about it a lot—but really quickly, do you think you’re at the very beginning of this field, or do you feel it’s a pretty advanced field? Just the speech recognition part.
Speech recognition, I think we’re nearing the end of speech recognition to be honest. I think that you could say that speech is fundamentally limited; you are limited by the signal that you are provided, and your job is to transcribe that.
Now, where speech recognition stops, that’s where natural language processing begins. As everyone knows, language is infinite, you can do anything with it, any permutation of words, sequences of words. So, I really think that natural language processing is the future of this field, and I know that a lot of people in speech are starting to try to incorporate more advanced language models into their research.
Yeah, that’s a really interesting question. So, I ran an article on Gigaom, where I had an Amazon Alexa device on my desk and I had a Google Assistant on my desk, and what I noticed right away is that they answer questions differently. These were factual questions, like “How many minutes are in a year?” and “Who designed the American flag?” They had different answers. And you can say it’s because of an ambiguity in the language, but if this is an ambiguity, then all language is naturally ambiguous.
So, the minutes in a year answer difference was that one gave you the minutes in 365.24 days, a solar year, and one gave you the minutes in a calendar year. And with regard to the flag, one said Betsy Ross, and one said the person who designed the fifty-star configuration on the current flag.
And so, we’re a long way away from the machines saying, “Well, wait a second, do you mean the current flag or the original flag?” or, “Are you talking about a solar year or a calendar year?” I mean, we’re really far away from that, aren’t we?
Yeah, I think that’s definitely true. You know, people really don’t understand how even humans process language, how we disambiguate different phrases, how we find out what are the relevant questions to ask to disambiguate these things. Obviously, people are working on that, but I think we are quite far from true natural language understanding, but yeah, I think that’s a really, really interesting question.
There were a lot of them, “Who invented the light bulb?” and “How many countries are there in the world?” I mean the list was endless. I didn’t have to look around to find them. It was almost everything I asked, well, not literally, “What’s 2+2?” is obviously different, but there were plenty of examples.  
To broaden that question, don’t you think if we were to build an AGI, an artificial general intelligence, an AI as versatile as a human, that’s table stakes, like you have to be able to do that much, right?
Oh, of course. I mean, I think that one of the defining things that makes human intelligence unique, is the ability to understand language and an understanding of grammar and all of this. It’s one of the most fundamental things that makes us human and intelligent. So I think, yeah, to have an artificial general intelligence, it would be completely vital and necessary to be able to do this sort of disambiguation.
Well, let me ratchet it up even another one. There’s a famous thought experiment called the Chinese Room problem. For the benefit of the listener, the setup is that there’s a person in a room who doesn’t speak any Chinese, and the room he’s in is full of this huge number of very specialized books; and people slide messages under the door to him that are written in Chinese. And he has this method where he looks up the first character and finds the book with that on the spine, and goes to the second character and the third and works his way through, until he gets to a book that says, “Write this down.” And he copies these symbols, again, he doesn’t know what the symbols are; he slides the message back out, and the person getting it thinks it’s a perfect Chinese answer, it’s brilliant, it rhymes, it’s great.
So, the thought experiment is this, does the man understand Chinese? And the point of the thought experiment is that this is all a computer does—it runs this deterministic program, and it never understands what it’s talking about. It doesn’t know if it’s about cholera or coffee beans or what have you. So, my question is, for an AGI to exist, does it need to understand the question in a way that’s different than how we’ve been using that word up until now?
That’s a good question. I think that, yeah, to have an artificial general intelligence, I think the computer would have to, in a way, understand the question. Now, that being said, what is the nature of understanding the question? How do we even think, is a question that I don’t think even we know the answer to. So, it’s a little bit difficult to say, exactly, what’s the minimum requirement that you would need for some sort of artificial general intelligence, because as it stands now, I don’t know. Maybe someone smarter than me knows the answer, but I don’t even know if I really understand how I understand things, if that makes sense to you.
So what do you do with that? Do you say, “Well, that’s just par for the course. There’s a lot of things in this universe we don’t understand, but we’re going to figure it out, and then we’ll build an AGI”? Is the question of understanding just a very straightforward scientific question, or is it a metaphysical question that we don’t really even know how to pose or answer?
I mean, I think that this question is a good question, and if we’re going about it the right way, it’s something that remains to be seen. But I think one way that we can try to ensure that we’re not straying off the path, is by going back to these biologically-inspired systems. Because we know that, at the end of the day, our brains are made up of neurons, synapses, connections, and there’s nothing very unique about this, it’s physical matter, there’s no theoretical reason why a computer cannot do the same computations.
So, if we can really understand how our brains are working, what the computations it performs are, how we have consciousness; then I think we can start to get at those questions. Now, that being said, in terms of where neuroscience is today, we really have a very limited idea of how our brains actually work. But I think it’s through this avenue that we stand the highest chance of success of trying to emulate, you know—
Let’s talk about that for a minute, I think that’s a fascinating topic. So, the brain has a hundred billion neurons that somehow come together and do what they do. There’s something called a nematode worm—arguably the most successful animal on the planet, ten percent of all animals on the planet are these little worms—they have I think 302 neurons in their brain. And there’s been an effort underway for twenty years to model that brain—302 neurons—in the computer and make a digitally living nematode worm, and even the people who have worked on that project for twenty years, don’t even know if that’s possible.
What I was hearing you say is, once we figure out what a neuron does—this reductionist view of the brain—we can build artificial neurons, and build a general intelligence, but what if every neuron in your brain has the complexity of a supercomputer? What if they are incredibly complicated things that have things going on at the quantum scale, that we are just so far away from understanding? Is that a tenable hypothesis? And doesn’t that suggest, maybe we should think about intelligence a different way because if a neuron’s as complicated as a supercomputer, we’re never going to get there.
That’s true, I am familiar with that research. So, I think that there’s a couple of ways that you can do this type of study because, for example, trying to model a neuron at the scale of its ion channels and individual connections is one thing, but there are many, many scales upon which your brain or any sort of neural system works.
I think to really get this understanding of how the brain works, it’s great to look at this very microscale, but it also helps to go very macro and instead of modeling every single component, try to, for example, take groups of neurons, and say, “How are they communicating together? How are they communicating with different parts of the brain?” Doing this, for example, is usually how human neuroscience works and humans are the ones with the intelligence. If you can really figure out on a larger scale, to the point where you can simplify some of these computations, and instead of understanding every single spike, perhaps understanding the general behavior or the general computation that’s happening inside the brain, then maybe it will serve to simplify this a little bit.
Where do you come down on all of that? Are we five years, fifty years or five hundred years away from cracking that nut, and really understanding how we understand and understanding how we would build a machine that would understand, all of this nuance? Do you think you’re going to live to see us make that machine?
I would be thrilled if I lived to see that machine, I’m not sure that I will. Exactly saying when this will happen is a bit hard for me to predict, but I know that we would need massive improvements; probably, algorithmically, probably in our hardware as well, because true intelligence is massively computational, and I think it’s going to take a lot of research to get there, but it’s hard to say exactly when that would happen.
Do you keep up with the Human Brain Project, the European initiative to do what you were talking about before, which is to be inspired by human brains and learn everything we can from that and build some kind of a computational equivalent?
A little bit, a little bit.
Do you have any thoughts on—if you were the betting sort—whether that will be successful or not?
I’m not sure if that’s really going to work out that well. Like you said before, given our current hardware, algorithms, our abilities to probe the human brain; I think it’s very difficult to make these very sweeping claims about, “Yes, we will have X amount of understanding about how these systems work,” so I’m not sure if it’s going to be successful in all the ways it’s supposed to be. But I think it’s a really valuable thing to do, whether or not you really achieve the stated goal, if that makes sense.
You mentioned consciousness earlier. So, consciousness, for the listeners, is something people often say we don’t know what it is; we know exactly what it is, we just don’t know how it is that it happens. What it is, is that we experience things, we feel things, we experience qualia—we know what pineapple tastes like.
Do you have any theories on consciousness? Where do you think it comes from, and, I’m really interested in, do we need consciousness in order to solve some of these AI problems that we all are so eager to solve? Do we need something that can experience, as opposed to just sense?
Interesting question. I think that there’s a lot of open research on how consciousness works, what it really means, how it helps us do this type of cognition. So, we know what it is, but how it works or how this would manifest itself in an artificial intelligence system, is really sort of beyond our grasp right now.
I don’t know how much true consciousness a machine needs, because, you could say, for example, that having a type of memory may be part of your consciousness, you know, being aware, learning things, but I don’t think we have yet enough really understanding of how this works to really say for sure.
All right fair enough. One more question and I’ll pull the clock back thirty years and we’ll talk about the here and now; but my last question is, do you think that a computer could ever feel something? Could a computer ever feel pain? You could build a sensor that tells the computer it’s on fire, but could a computer ever feel something, could we build such a machine?
I think that it’s possible. So, like I said before, there’s really no reason why—what our brain does is really a very advanced biological computer—you shouldn’t be able to feel pain. It is a sensation, but it’s really just a transfer of information, so I think that it is possible. Now, that being said, how this would manifest, or what a computer’s reaction would be to pain or what would happen, I’m not sure what that would be, but I think it’s definitely possible.
Fair enough. I mentioned in your introduction that you’re the CTO of an AI company Droice Labs, and the only setup I made was that it was a healthcare company. Tell us a little bit more, what challenge that Droice Labs is trying to solve, and what the hope is, and what your present challenges are and kind of the state of where you’re at?
Sure. Droice is a healthcare company that uses artificial intelligence to help provide artificial intelligence solutions to hospitals and healthcare providers. So, one of the main things that we’re focusing on right now is to try to help doctors choose the right treatment for their patients. This means things like, for example, you come in, maybe you’re sick, you have a cough, you have pneumonia, let’s say, and you need an antibiotic. What we try to do is, when you’re given an antibiotic, we try to predict whether or not this treatment will be effective for you, and also whether or not it’ll have any sort of adverse event on you, so both try to get people healthy, and keep them safe.
And so, this is really what we’re focusing on at the moment, trying to make a sort of artificial brain for healthcare that can, shall we say, augment the intelligence of the doctors and try to make sure that people stay healthy. I think that healthcare’s a really interesting sphere in which to use artificial intelligence because currently the technology is not very widespread because of the difficulty in working with hospital and medical data, so I think it’s a really interesting opportunity.
So, let’s talk about that for a minute, AIs are generally only as good as the data we train them with. Because I know that whenever I have some symptom, I type it into the search engine of choice, and it tells me I have a terminal illness; it just happens all the time. And in reality, of course, whatever that terminal illness is, there is a one-in-five-thousand chance that I have that, and then there’s also a ninety-nine percent chance I have whatever much more common, benign thing. How are you thinking about how you can get enough data so that you can build these statistical models and so forth?
We’re a B2B company, so we have partnerships with around ten hospitals right now, and what we do is get big data dumps from them of actual electronic health records. And so, what we try to do is actually use real patient records, like, millions of patient records that we obtain directly from our hospitals, and that’s how we really are able to get enough data to make these types of predictions.
How accurate does that data need to be? Because it doesn’t have to be perfect, obviously. How accurate does it need to be to be good enough to provide meaningful assistance to the doctor?
That is actually one of the big challenges, especially in this type of space. In healthcare, it’s a bit hard to say which data is good enough, because it’s very, very common. I mean, one of the hallmarks of clinical or medical data is that it will, by default, contain many, many missing values, you never have the full story on any given patient.
Additionally, it’s very common to have things like errors, there’s unstructured text in your medical record that very often contains mistakes or just insane sentence fragments that don’t really make sense to anyone but a doctor, and this is one of the things that we work really hard on, where a lot of times traditional AI methods may fail, but we basically spend a lot of time trying to work with this data in different ways, come up with noise-robust pipelines that can really make this work.
I would love to hear more detail about that, because I’m sure it’s full of things like, “Patient says their eyes water whenever they eat potato chips,” and you know, that’s like a data point, and it’s like, what do you do with that. If that is a big problem, can you tell us what some of the ways around it might be?
Sure. I’m sure you’ve seen a lot of crazy stuff in these health records, but what we try to do is—instead of biasing our models by doing anything in a rule-based manner—we use the fact that we have big data, we have a lot of data points, to try to really come up with robust models, so that, essentially, we don’t really have to worry about all that crazy stuff in there about potato chips and eyes watering.
And so, what we actually end up doing is, basically, we take these many, many millions of individual electronic health records, and try to combine that with outside sources of information, and this is one of the ways that we can try to really augment the data on our health record to make sure that we’re getting the correct insights about it.
So, with your example, you said, “My eyes water when I eat potato chips.” What we end up doing is taking that sort of thing, and in an automatic way, searching sources of public information, for example clinical trials information or published medical literature, and we try to find, for example, clinical trials or papers about the side effects of rubbing your eyes while eating potato chips. Now of course, that’s a ridiculous example, but you know what I mean.
And so, by augmenting this public and private data together, we really try to create this setup where we can get the maximum amount of information out of this messy, difficult to work with data.
The kinds of data you have that are solid data points, would be: how old is the patient, what’s their gender, do they have a fever, do they have aches and pains; that’s very coarse-level stuff. But like—I’m regretting using the potato chip example because now I’m kind of stuck with it—but, a potato chip is made of a potato which is a tuber, which is a nightshade and there may be some breakthrough, like, “That may be the answer, it’s an allergic reaction to nightshades. And that answer is so many levels removed.
I guess what I’m saying is, and you said earlier, language is infinite, but health is near that, too, right? There are so many potential things something could be, and yet, so few data points, that we must try to draw from. It would be like, if I said, “I know a person who is 6’ 4” and twenty-seven years old and born in Chicago, what’s their middle name?” It’s like, how do you even narrow it down to a set of middle names?
Right, right. Okay, I think I understand what you’re saying. This is, obviously, a challenge, but one of the ways that we kind of do this is, the first thing is our artificial intelligence is really intended for doctors and not the patients. Although, we were just talking about AGI and when it will happen, but the reality is we’re not there yet, so while our system tries to make these predictions, it’s under the supervision of a doctor. So, they’re really looking at these predictions and trying to pull out relevant things.
Now, you mentioned, the structured data—this is your age, your weight, maybe your sex, your medications; this is structured—but maybe the important thing is in the text, or is in the unstructured data. So, in this case, one of the things that we try to do, and it’s one of the main focuses of what we do, is to try to use natural language processing, NLP, to really make sure that we’re processing this unstructured data, or this text, in a way to really come up with a very robust, numerical representation of the important things.
So, of course, you can mine this information, this text, to try to understand, for example, you have a patient who has some sort of allergy, and it’s only written in this text, right? In that case, you need a system to really go through this text with a fine-tooth comb, and try to really pull out risk factors for this patient, relevant things about their health and their medical history that may be important.
So, is it not the case that diagnosing—if you just said, here is a person who manifests certain symptoms, and I want to diagnose what they have—may be the hardest problem possible. Especially compared to where we’ve seen success, which is, like, here is a chest x-ray, we have a very binary question to ask: does this person have a tumor or do they not? Where the data is: here’s ten thousand scans with the tumor, here’s a hundred thousand without a tumor.
Like, is it the cold or the flu? That would be an AI kind of thing because an expert system could do that. I’m kind of curious, tell me what you think—and then I’d love to ask, what would an ideal world look like, what would we do to collect data in an ideal world—but just with the here and now, aspirationally, what do you think is as much as we can hope for? Is it something, like, the model produces sixty-four things that this patient may have, rank ordered, like a search engine would do from the most likely to the least likely, and the doctor can kind of skim down it and look for something that catches his or her eye. Is that as far as we can go right now? Or, what do you think, in terms of general diagnosing of ailments?
Sure, well, actually, what we focus on currently is really on the treatment, not on the diagnosis. I think the diagnosis is a more difficult problem, and, of course, we really want to get into that in the future, but that is actually somewhat more of a very challenging sort of thing to do.
That being said, what you mentioned, you know, saying, “Here’s a list of things, let’s make some predictions of it,” is actually a thing that we currently do in terms of treatments for patients. So, one example of a thing that we’ve done is built a system that can predict surgical complications for patients. So, imagine, you have a patient that is sixty years old and is mildly septic, and may need some sort of procedure. What we can do is find that there may be a couple alternative procedures that can be given, or a nonsurgical intervention that can help them manage their condition. So, what we can do is predict what will happen with each of these different treatments, what is the likelihood it will be successful, as well as weighing this against their risk options.
And in this way, we can really help the doctor choose what sort of treatment that they should give this person, and it gives them some sort of actionable insight, that can help them get their patients healthy. Of course, in the future, I think it would be amazing to have some sort of end to end system that, you know, a patient comes in, and you can just get all the information and it can diagnose them, treat them, get them better, but we’re definitely nowhere near that yet.
Recently, IBM made news that Watson had prescribed treatment for cancer patients that was largely identical to what the doctors did, but it had the added benefit that in a third of the cases it found additional treatment options, because it had virtue of being trained on a quarter million medical journals. Is that the kind of thing that’s like “real, here, today,” that we will expect to see more things like that?
I see. Yeah, that’s definitely a very exciting thing, and I think that’s great to see. One of the things that’s very interesting, is that IBM primarily works on cancer. It’s lacking in these high prescription volume sorts of conditions, like heart disease or diabetes. So, I think that while this is very exciting, this is definitely a sort of technology, and a space for artificial intelligence, where it really needs to be expanded, and there’s a lot of room to grow.
So, we can sequence a genome for $1,000. How far away are we from having enough of that data that we get really good insights into, for example, a person has this combination of genetic markers, and therefore this is more likely to work or not work. I know that in isolated cases we can do that, but when will we see that become just kind of how we do things on a day-to-day basis?
I would say, probably, twenty-five years from the clinic. I mean, it’s great, this information is really interesting, and we can do it, but it’s not widely used. I think there are too many regulations in place right now that keep this from happening, so, I think it’s going to be, like I said, maybe twenty-five years before we really see this very widely used for a good number of patients.
So are there initiatives underway that you think merit support that will allow this information to be collected and used in ways that promote the greater good, and simultaneously, protect the privacy of the patients? How can we start collecting better data?
Yeah, there are a lot of people that are working on this type of thing. For example, Obama had a precision medicine initiative and these types of things where you’re really trying to, basically, get your health records and your genomic data, and everything consolidated and have a very easy flow of information so that doctors can easily integrate information from many sources, and have very complete patient profiles. So, this is a thing that’s currently underway.
To pull out a little bit and look at the larger world, you’re obviously deeply involved in speech, and language processing, and health care, and all of these areas where we’ve seen lots of advances happening on a regular basis, and it’s very exciting. But then there’s a lot of concern from people who have two big worries. One is the effect that all of this technology is going to have on employment. And there’s two views.
One is that technology increases productivity, which increases wages, and that’s what’s happened for two hundred years, or, this technology is somehow different, it replaces people and anything a person can do eventually the technology will do better. Which of those camps, or a third camp, do you fall into? What is your prognosis for the future of work?
Right. I think that technology is a good thing. I know a lot of people have concerns, for example, that if there’s too much artificial intelligence it will replace my job, there won’t be room for me and for what I do, but I think that what’s actually going to happen, is we’re just going to see, shall we say, a shifting employment landscape.
Maybe if we have some sort of general intelligence, then people can start worrying, but, right now, what we’re really doing through artificial intelligence is augmenting human intelligence. So, although some jobs become obsolete, now to maintain these systems, build these systems, I believe that you actually have, now, more opportunities there.
For example, ten to fifteen years ago, there wasn’t such a demand for people with software engineering skills, and now it’s almost becoming something that you’re expected to know, or, like, the internet thirty years back. So, I really think that this is going to be a good thing for society. It may be hard for people who don’t have any sort of computer skills, but I think going forward, that these are going to be much more important.
Do you consume science fiction? Do you watch movies, or read books, or television, and if so, are there science fiction universes that you look at and think, “That’s kind of how I see the future unfolding”?
Have you ever seen the TV show Black Mirror?
Well, yeah that’s dystopian though, you were just saying things are going to be good. I thought you were just saying jobs are good, we’re all good, technology is good. Black Mirror is like dark, black, mirrorish.
Yeah, no, I’m not saying that’s what’s going to happen, but I think that’s presenting the evil side of what can happen. I don’t think that’s necessarily realistic, but I think that show actually does a very good job of portraying the way that technology could really be integrated into our lives. Without all of the dystopian, depressing stories, I think that the way that it shows the technology being integrated into people’s lives, how it affects the way people live—I think it does a very good job of doing things like that.
I wonder though, science fiction movies and TV are notoriously dystopian, because there’s more drama in that than utopian. So, it’s not conspiratorial or anything, I’m not asserting that, but I do think that what it does, perhaps, is causes people—somebody termed it “generalizing from fictional evidence,” that you see enough views of the future like that, you think, “Oh, that’s how it’s going to happen.” And then that therefore becomes self-fulfilling.
Frank Herbert, I think, it was who said, “Sometimes the purpose of science fiction is to keep a world from happening.” So do you think those kinds of views of the world are good, or do you think that they increase this collective worry about technology and losing our humanity, becoming a world that’s blackish and mirrorish, you know?
Right. No, I understand your point and actually, I agree. I think there is a lot of fear, which is quite unwarranted. There is actually a lot more transparency in AI now, so I think that a lot of those fears are just, well, given the media today, as I’m sure we’re all aware, it’s a lot of fear mongering. I think that these fears are really something that—not to say there will be no negative impact—but, I think, every cloud has its silver lining. I think that this is not something that anyone really needs to be worrying about. One thing that I think is really important is to have more education for a general audience, because I think part of the fear comes from not really understanding what AI is, what it does, how it works.
Right, and so, I was just kind of thinking through what you were saying, there’s an initiative in Europe that, AI engines—kind of like the one you’re talking about that’s suggesting things—need to be transparent, in the sense they need to be able to explain why they’re making that suggestion.
But, I read one of your papers on deep neural nets, and it talks about how the results are hard to understand, if not impossible to understand. Which side of that do you come down on? Should we limit the technology to things that can be explained in bulleted points, or do we say, “No, the data is the data and we’re never going to understand it once it starts combining in these ways, and we just need to be okay with that”?
Right, so, one of the most overused phrases in all of AI is that “neural networks are a black box.” I’m sure we’re all sick of hearing that sentence, but it’s kind of true. I think that’s why I was interested in researching this topic. I think, as you were saying before, the why in AI is very, very important.
So, I think, of course we can benefit from AI without knowing. We can continue to use it like a black box, it’ll still be useful, it’ll still be important. But I think it will be far more impactful if you are able to explain why, and to really demystify what’s happening.
One good example from my own company is that in medicine it’s vital for the doctor to know why you’re saying what you’re saying, at Droice. So, if a patient comes in and you say, “I think this person is going to have a very negative reaction to this medicine,” it’s very vital for us to try to analyze the neural network and explain, “Okay, it’s really this feature of this person’s health record, for example, the fact that they’re quite old and on another medication.” That really makes them trust the system, and really eases the adoption, and allows them to integrate into traditionally less technologically focused fields.
So, I think that there’s a lot of research now that’s going into the why in AI, and it’s one of my focuses of research, and I know the field has really been blooming in the last couple of years, because I think people are realizing that this is extremely important and will help us not only make artificial intelligence more translational, but also help us to make better models.
You know, in The Empire Strikes Back, when Luke is training on Dagobah with Yoda, he asked him, “Why, why…” and Yoda was like, “There is no why.” Do you think there are situations where there is no why? There is no explainable reason why it chose what it did?
Well, I think there is always a reason. For example, you like ice cream; well, maybe it’s a silly reason, but the reason is that it tastes good. It might not be, you know, you like pistachio better than caramel flavor—so, let’s just say the reason may not be logical, but there is a reason, right? It’s because it activates the pleasure center in your brain when you eat it. So, I think that if you’re looking for interpretability, in some cases it could be limited but I think there’s always something that you could answer when asking why.
Alright. Well, this has been fascinating. If people want to follow you, keep up with what you’re doing, keep up with Droice, can you just run through the litany of ways to do that?
Yeah, so we have a Twitter account, it’s “DroiceLabs,” and that’s mostly where we post. And we also have a website: www.droicelabs.com, and that’s where we post most of the updates that we have.
Alright. Well, it has been a wonderful and far ranging hour, and I just want to thank you so much for being on the show.
Thank you so much for having me.

Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.
[voices_in_ai_link_back]

Voices in AI – Episode 13: A Conversation with Bryan Catanzaro

[voices_in_ai_byline]
In this episode, Byron and Bryan talk about sentience, transfer learning, speech recognition, autonomous vehicles, and economic growth.
[podcast_player name=”Episode 13: A Conversation with Bryan Catanzaro” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2017-10-16-(00-54-18)-bryan-catanzaro.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2017/10/voices-headshot-card-5.jpg”]
[voices_in_ai_link_back]
Byron Reese: This is “Voices in AI” brought to you by Gigaom. I’m Byron Reese. Today, our guest is Bryan Catanzaro. He is the head of Applied AI Research at NVIDIA. He has a BS in computer science and Russian from BYU, an MS in electrical engineering from BYU, and a PhD in both electrical engineering and computer science from UC Berkeley. Welcome to the show, Bryan.
Bryan Catanzaro: Thanks. It’s great to be here.
Let’s start off with my favorite opening question. What is artificial intelligence?
It’s such a great question. I like to think about artificial intelligence as making tools that can perform intellectual work. Hopefully, those are useful tools that can help people be more productive in the things that they need to do. There’s a lot of different ways of thinking about artificial intelligence, and maybe the way that I’m talking about it is a little bit more narrow, but I think it’s also a little bit more connected with why artificial intelligence is changing so many companies and so many things about the way that we do things in the world economy today is because it actually is a practical thing that helps people be more productive in their work. We’ve been able to create industrialized societies with a lot of mechanization that help people do physical work. Artificial intelligence is making tools that help people do intellectual work.
I ask you what artificial intelligence is, and you said it’s doing intellectual work. That’s sort of using the word to define it, isn’t it? What is that? What is intelligence?
Yeah, wow…I’m not a philosopher, so I actually don’t have like a…
Let me try a different tact. Is it artificial in the sense that it isn’t really intelligent and it’s just pretending to be, or is it really smart? Is it actually intelligent and we just call it artificial because we built it?
I really liked this idea from Yuval Harari that I read a while back where he said there’s the difference between intelligence and sentience, where intelligence is more about the capacity to do things and sentience is more about being self-aware and being able to reason in the way that human beings reason. My belief is that we’re building increasingly intelligent systems that can perform what I would call intellectual work. Things about understanding data, understanding the world around us that we can measure with sensors like video cameras or audio or that we can write down in text, or record in some form. The process of interpreting that data and making decisions about what it means, that’s intellectual work, and that’s something that we can create machines to be more and more intelligent at. I think the definitions of artificial intelligence that move more towards consciousness and sentience, I think we’re a lot farther away from that as a community. There are definitely people that are super excited about making generally intelligent machines, but I think that’s farther away and I don’t know how to define what general intelligence is well enough to start working on that problem myself. My work focuses mostly on practical things—helping computers understand data and make decisions about it.
Fair enough. I’ll only ask you one more question along those lines. I guess even down in narrow AI, though, if I had a sprinkler that comes on when my grass gets dry, it’s responding to its environment. Is that an AI?
I’d say it’s a very small form of AI. You could have a very smart sprinkler that was better than any person at figuring out when the grass needed to be watered. It could take into account all sorts of sensor data. It could take into account historical information. It might actually be more intelligent at figuring out how to irrigate than a human would be. And that’s a very narrow form of intelligence, but it’s a useful one. So yeah, I do think that could be considered a form of intelligence. Now it’s not philosophizing about the nature of irrigation and its harm on the planet or the history of human interventions on the world, or anything like that. So it’s very narrow, but it’s useful, and it is intelligent in its own way.
Fair enough. I do want to talk about AGI in a little while. I have some questions around…We’ll come to that in just a moment. Just in the narrow AI world, just in your world of using data and computers to solve problems, if somebody said, “Bryan, what is the state-of-the-art? Where are we at in AI? Is this the beginning and you ‘ain’t seen nothing yet’? Or are we really doing a lot of cool things, and we are well underway to mastering that world?”
I think we’re just at the beginning. We’ve seen so much progress over the past few years. It’s been really quite astonishing, the kind of progress we’ve seen in many different domains. It all started out with image recognition and speech recognition, but it’s gone a long way from there. A lot of the products that we interact with on a daily basis over the internet are using AI, and they are providing value to us. They provide our social media feeds, they provide recommendations and maps, they provide conversational interfaces like Siri or Android Assistant. All of those things are powered by AI and they are definitely providing value, but we’re still just at the beginning. There are so many things we don’t know yet how to do and so many underexplored problems to look at. So I believe we’ll continue to see applications of AI come up in new places for quite a while to come.
If I took a little statuette of a falcon, let’s say it’s a foot tall, and I showed it to you, and then I showed you some photographs, and said, “Spot the falcon.” And half the time it’s sticking halfway behind a tree, half the time it’s underwater; one time it’s got peanut butter smeared on it. A person can do that really well, but computers are far away from that. Is that an example of us being really good at transfer learning? We’re used to knowing what things with peanut butter on them look like? What is it that people are doing that computers are having a hard time to do there?
I believe that people have evolved, over a very long period of time, to operate on planet Earth with the sensors that we have. So we have a lot of built-in knowledge that tells us how to process the sensors that we have and models the world. A lot of it is instinctual, and some of it is learned. I have young children, like a year-old or so. They spend an awful lot of time just repetitively probing the world to see how it’s going to react when they do things, like pushing on a string, or a ball, and they do it over and over again because I think they’re trying to build up their models about the world. We have actually very sophisticated models of the world that maybe we take for granted sometimes because everyone seems to get them so easily. It’s not something that you have to learn in school. But these models are actually quite useful, and they’re more sophisticated than – and more general than – the models that we currently can build with today’s AI technology.
To your question about transfer learning, I feel like we’re really good at transfer learning within the domain of things that our eyes can see on planet Earth. There are probably a lot of situations where an AI would be better at transfer learning. Might actually have fewer assumptions baked in about how the world is structured, how objects look, what kind of composition of objects is actually permissible. I guess I’m just trying to say we shouldn’t forget that we come with a lot of context. That’s instinctual, and we use that, and it’s very sophisticated.
Do you take from that that we ought to learn how to embody an AI and just let it wander around the world, bumping into things and poking at them and all of that? Is that what you’re saying? How do we overcome that?
It’s an interesting question you note. I’m not personally working on trying to build artificial general intelligence, but it will be interesting for those people that are working on it to see what kind of childhood is necessary for an AI. I do think that childhood is a really important part of developing human intelligence, and plays a really important part of developing human intelligence because it helps us build and calibrate these models of how the world works, which then we apply to all sorts of things like your question of the falcon statue. Will computers need things like that? It’s possible. We’ll have to see. I think one of the things that’s different about computers is that they’re a lot better at transmitting information identically, so it may be the kind of thing that we can train once, and then just use repeatedly – as opposed to people, where the process of replicating a person is time-consuming and not exact.
But that transfer learning problem isn’t really an AGI problem at all, though. Right? We’ve taught a computer to recognize a cat, by giving it a gazillion images of a cat. But if we want to teach it how to recognize a bird, we have to start over, don’t we?
I don’t think we generally start over. I think most of the time if people wanted to create a new classifier, they would use transfer learning from an existing classifier that had been trained on a wide variety of different object types. It’s actually not very hard to do that, and people do that successfully all the time. So at least for image recognition, I think transfer learning works pretty well. For other kinds of domains, they can be a little bit more challenging. But at least for image recognition, we’ve been able to find a set of higher-level features that are very useful in discriminating between all sorts of different kinds of objects, even objects that we haven’t seen before.
What about audio? Because I’m talking to you now and I’m snapping my fingers. You don’t have any trouble continuing to hear me, but a computer trips over that. What do you think is going on in people’s minds? Why are we good at that, do you think? To get back to your point about we live on Earth, it’s one of those Earth things we do. But as a general rule, how do we teach that to a computer? Is that the same as teaching it to see something, as to teach it to hear something?
I think it’s similar. The best speech recognition accuracies come from systems that have been trained on huge amounts of data, and there does seem to be a relationship that the more data we can train a model on, the better the accuracy gets. We haven’t seen the end of that yet. I’m pretty excited about the prospects of being able to teach computers to continually understand audio, better and better. However, I wanted to point out, humans, this is kind of our superpower: conversation and communication. You watch birds flying in a flock, and the birds can all change direction instantaneously, and the whole flock just moves, and you’re like, “How do you do that and not run into each other?” They have a lot of built-in machinery that allows them to flock together. Humans have a lot of built-in machinery for conversation and for understanding spoken language. The pathways for speaking and the pathways for hearing evolve together, so they’re really well-matched.
With computers trying to understand audio, we haven’t gotten to that point yet. I remember some of the experiments that I’ve done in the past with speech recognition, that the recognition performance was very sensitive to compression artifacts that were actually not audible to humans. We could actually take a recording, like this one, and recompress it in a way that sounded identical to a person, and observe a measurable difference in the recognition accuracy of our model. That was a little disconcerting because we’re trying to train the model to be invariant to all the things that humans are invariant to, but it’s actually quite hard to do that. We certainly haven’t achieved that yet. Often, our models are still what we would call “overfitting”, where they’re paying attention to a lot of details that help it perform the tasks that we’re asking it to perform, but they’re not actually helpful to solving the fundamental tasks that we’re trying to perform. And we’re continually trying to improve our understanding of the tasks that we’re solving so that we can avoid this, but we’ve still got more work to do.
My standard question when I’m put in front of a chatbot or one of the devices that sits on everybody’s desktop, I can’t say them out loud because they’ll start talking to me right now, but the question I always ask is “What is bigger, a nickel or the sun?” To date, nothing has ever been able to answer that question. It doesn’t know how sun is spelled. “Whose son? The sun? Nickel? That’s actually a coin.” All of that. What all do we have to get good at, for the computer to answer that question? Run me down the litany of all the things we can’t do, or that we’re not doing well yet, because there’s no system I’ve ever tried that answered that correctly.
I think one of the things is that we’re typically not building chat systems to answer trivia questions just like that. I think if we were building a special-purpose trivia system for questions like that, we probably could answer it. IBM Watson did pretty well on Jeopardy, because it was trained to answer questions like that. I think we definitely have the databases, the knowledge bases, to answer questions like that. The problem is that kind of a question is really outside of the domain of most of the personal assistants that are being built as products today because honestly, trivia bots are fun, but they’re not as useful as a thing that can set a timer, or check the weather, or play a song. So those are mostly the things that those systems are focused on.
Fair enough, but I would differ. You can go to Wolfram Alpha and say, “What’s bigger, the Statue of Liberty or the Empire State Building?” and it’ll answer that. And you can ask Amazon’s product that same question, and it’ll answer it. Is that because those are legit questions and my question is not legit, or is it because we haven’t taught systems to disintermediate very well and so they don’t really know what I mean when I say “sun”?
I think that’s probably the issue. There’s a language modeling problem when you say, “What’s bigger, a nickel or the sun?” The sun can mean so many different things, like you were saying. Nickel, actually, can be spelled a couple of different ways and has a couple of different meanings. Dealing with ambiguities like that is a little bit hard. I think when you ask that question to me, I categorize this as a trivia question, and so I’m able to disambiguate all of those things, and look up the answer in my little knowledge base in my head, and answer your question. But I actually don’t think that particular question is impossible to solve. I just think it’s just not been a focus to try to solve stuff like that, and that’s why they’re not good.
AIs have done a really good job playing games: Deep Blue, Watson, AlphaGo, and all of that. I guess those are constrained environments with a fixed set of rules, and it’s easy to understand who wins, and what a point is, and all that. What is going to be the next thing, that’s a watershed event, that happens? Now they can outbluff people in poker. What’s something that’s going to be, in a year, or two years, five years down the road, that one day, it wasn’t like that in the universe, and the next day it was? And the next day, the best Go player in the world was a machine.
The thing that’s on my mind for that right now is autonomous vehicles. I think it’s going to change the world forever to unchain people from the driver’s seat. It’s going to give people hugely increased mobility. I have relatives that their doctors have asked them to stop driving cars because it’s no longer safe for them to be doing that, and it restricts their ability to get around the world, and that frustrates them. It’s going to change the way that we all live. It’s going to change the real estate markets, because we won’t have to park our cars in the same places that we’re going to. It’s going to change some things about the economy, because there’s going to be new delivery mechanisms that will become economically viable. I think intelligence that can help robots essentially drive around the roads, that’s the next thing that I’m most excited about, that I think is really going to change everything.
We’ll come to that in just a minute, but I’m actually asking…We have self-driving cars, and on an evolutionary basis, they’ll get a little better and a little better. You’ll see them more and more, and then someday there’ll be even more of them, and then they’ll be this and this and this. It’s not that surprise moment, though, of AlphaGo just beat Lee Sedol at Go. I’m wondering if there is something else like that—that it’s this binary milestone that we can all keep our eye open for?
I don’t know. As far as we have self-driving cars already, I don’t have a self-driving car that could say, for example, let me sit in it at nighttime, go to sleep and wake up, and it brought me to Disneyland. I would like that kind of self-driving car, but that car doesn’t exist yet. I think self-driving trucks that can go cross country carrying stuff, that’s going to radically change the way that we distribute things. I do think that we have, as you said, we’re on the evolutionary path to self-driving cars, but there’s going to be some discrete moments when people actually start using them to do new things that will feel pretty significant.
As far as games and stuff, and computers being better at games than people, it’s funny because I feel like Silicon Valley has, sometimes, a very linear idea of intelligence. That one person is smarter than another person maybe because of an SAT score, or an IQ test, or something. They use that sort of linearity of an intelligence to where some people feel threatened by artificial intelligence because they extrapolate that artificial intelligence is getting smarter and smarter along this linear scale, and that’s going to lead to all sorts of surprising things, like Lee Sedol losing to Go, but on a much bigger scale for all of us. I feel kind of the opposite. Intelligence is such a multidimensional thing. The fact that a computer is better at Go then I am doesn’t really change my life very much, because I’m not very good at Go. I don’t play Go. I don’t consider Go to be an important part of my intelligence. Same with chess. When Gary Kasparov lost to Deep Blue, that didn’t threaten my intelligence. I am sort of defining the way that I work and how I add value to the world, and what things make me happy on a lot of other axes besides “Can I play chess?” or “Can I play Go?” I think that speaks to the idea that intelligence really is very multifaceted. There’s a lot of different kinds – there’s probably thousands or millions of different kinds of intelligence – and it’s not very linearizable.
Because of that, I feel like, as we watch artificial intelligence develop, we’re going to see increasingly more intelligent machines, but they’re going to be increasingly more intelligent in some very narrow domains like “this is the better Go-playing robot than me”, or “this is the better car driver than me”. That’s going to be incredibly useful, but it’s not going to change the way that I think about myself, or about my work, or about what makes me happy. Because I feel like there are so many more dimensions of intelligence that are going to remain the province of humans. That’s going to take a very long time, if ever, for artificial intelligence to become better at all of them than us. Because, as I said, I don’t believe that intelligence is a linearizable thing.
And you said you weren’t a philosopher. I guess the thing that’s interesting to people, is there was a time when information couldn’t travel faster than a horse. And then the train came along, and information could travel. That’s why in the old Westerns – if they ever made it on the train, that was it, and they were out of range. Nothing traveled faster than the train. Then we had a telegraph and, all of a sudden, that was this amazing thing that information could travel at the speed of light. And then one time they ran these cables under the ocean, and somebody in England could talk to somebody in the United States instantly. Each one of them, and I think it’s just an opportunity to pause, and reflect, and to mark a milestone, and to think about what it all means. I think that’s why a computer just beat these awesome poker players. It learned to bluff. You just kind of want to think about it.
So let’s talk about jobs for a moment because you’ve been talking around that for just a second. Just to set the question up: Generally speaking, there are three views of what automation and artificial intelligence are going to do to jobs. One of them reflects kind of what you were saying is that there are going to be a certain group of workers who are considered low skilled, and there are going to be automation that takes these low-skilled jobs, and that there’s going to be a sizable part of the population that’s locked out of the labor market, and it’s kind of like the permanent Great Depression over and over and over forever. Then there’s another view that says, “No, you don’t understand. There’s going to be an inflection point where they can do every single thing. They’re going to be a better conductor and a better painter and a better novelist and a better everything than us. Don’t think that you’ve got something that a machine can’t do.” Clearly, that isn’t your viewpoint from what you said. Then there’s a third viewpoint that says, “No, in the past, even when we had these transformative technologies like electricity and mechanization, people take those technologies and they use them to increase their own productivity and, therefore, their own incomes. And you never have unemployment go up because of them, because people just take it and make a new job with it.” Of those three, or maybe a fourth one I didn’t cover; where do you find yourself?
I feel like I’m closer in spirit to number three. I’m optimistic. I believe that the primary way that we should expect economic growth in the future is by increased productivity. If you buy a house or buy some stock and you want to sell it 20 or 30 years from now, who’s going to buy it, and with what money, and why do you expect the price to go up? I think the answer to that question should be the people in the future should have more money than us because they’re more productive, and that’s why we should expect our world economy to continue growing. Because we find more productivity. I actually feel like this is actually necessary. World productivity growth has been slowing for the past several decades, and I feel like artificial intelligence is our way out of this trap where we have been unable to figure out how to grow our economy because our productivity hasn’t been improving. I actually feel like this is a necessary thing for all of us, is to figure out how to improve productivity, and I think AI is the way that we’re going to do that for the next several decades.
The one thing that I disagreed with in your third statement was this idea that unemployment would never go up. I think nothing is ever that simple. I actually am quite concerned about job displacement in the short-term. I think there will be people that suffer and in fact, I think, to a certain extent, this is already happening. The election of Donald Trump was an eye-opener to me that there really exists a lot of people that feel that they have been left behind by the economy, and they come to very different conclusions about the world than I might. I think that it’s possible that, as we continue to digitize our society, and AI becomes a lever that some people will become very good at using to increase their productivity, that we’re going to see increased inequality and that worries me.
The primary challenges that I’m worried about, for our society, with the rise of AI, have to do more with making sure that we give people purpose and meaning in their life that maybe doesn’t necessarily revolve around punching out a timecard, and showing up to work at 8 o’clock in the morning every day. I want to believe that that future exists. There are a lot of people right now that are brilliant people that have a lot that they could be contributing in many different ways – intellectually, artistically – that are currently not given that opportunity, because they maybe grew up in a place that didn’t have the right opportunities for them to get the right education so that they could apply their skills in that way, and many of them are doing jobs that I think don’t allow them to use their full potential.
So I’m hoping that, as we automate many of those jobs, that more people will be able to find work that provides meaning and purpose to them and allows them to actually use their talents and make the world a better place, but I acknowledge that it’s not going to be an easy transition. I do think that there’s going to be a lot of implications for how our government works and how our economy works, and I hope that we can figure out a way to help defray some of the pain that will happen during this transition.
You talked about two things. You mentioned income inequality as a thing, but then you also said, “I think we’re going to have unemployment from these technologies.” Separating those for a minute and just looking at the unemployment one for a minute, you say things are never that simple. But with the exception of the Great Depression, which nobody believes was caused by technology, unemployment has been between 5% and 10% in this country for 250 years and it only moves between 5% and 10% because of the business cycle, but there aren’t counterexamples. Just imagine if your job was you had animals that performed physical labor. They pulled, and pushed, and all of that. And somebody made the steam engine. That was disruptive. But even when we had that, we had electrification of industry. We adopted steam power. We went from 5% to 85% of our power being generated by steam in just 22 years. And even when you had that kind of disruption, you still didn’t have any increases in unemployment. I’m curious, what is the mechanism, in your mind, by which this time is different?
I think that’s a good point that you raise, and I actually haven’t studied all of those other transitions that our society has gone through. I’d like to believe that it’s not different. That would be a great story if we could all come to agreement, that we won’t see increased unemployment from AI. I think the reason why I’m a little bit worried is that I think this transition in some fields will happen quickly, maybe more quickly than some of the transitions in the past did. Just because, as I was saying, AI is easier to replicate than some other technologies, like electrification of a country. It takes a lot of time to build out physical infrastructure that can actually deliver that. Whereas I think for a lot of AI applications, that infrastructure will be cheaper and quicker to build, so the velocity of the change might be faster and that could lead to a little bit more shock. But it’s an interesting point you raise, and I certainly hope that we can find a way through this transition that is less painful than I’m worried it could be.
Do you worry about misuse of AI? I’m an optimist on all of this. And I know that every time we have some new technology come along, people are always looking at the bad cases. You take something like the internet, and the internet has overwhelmingly been a force for good. It connects people in a profound way. There’s a million things. And yeah, some people abuse it. But on net, all technology, I believe, almost all technology on net is used for good because I think, on net, people, on average, are more inclined to build than to destroy. That being said, do you worry about nefarious uses of AI, specifically in warfare?
Yeah. I think that there definitely are going to be some scary killer robots that armies make. Armies love to build machinery that kills things and AI will help them do that, and that will be scary. I think it’s interesting, like, where is the real threat going to come from? Sometimes, I feel like the threat of malevolent AI being deployed against people is going to be more subtle than that. It’s going to be more about things that you can do after compromising fiber systems of some adversary, and things that you can do to manipulate them using AI. There’s been a lot of discussion about Russian involvement in the 2016 election in the US, and that wasn’t about sending evil killer robots. It was more about changing people’s opinions, or attempting to change their opinions, and AI will give entities tools to do that on a scale that maybe we haven’t seen before. I think there may be nefarious uses of AI that are more subtle and harder to see than a full-frontal assault from a movie with evil killer robots. I do worry about all of those things, but I also share your optimism. I think we humans, we make lots of mistakes and we shouldn’t give ourselves too easy of a time here. We should learn from those mistakes, but we also do a lot of things well. And we have used technologies in the past to make the world better, and I hope AI will do so as well.
Pedro Domingo wrote a book called The Master Algorithm where he says there are all of these different tools and techniques that we use in artificial intelligence. And he surmises that there is probably a grandparent algorithm, the master algorithm, that can solve any problem, any range of problems. Does that seem possible to you or likely, or do you have any thoughts on that?
I think it’s a little bit far away, at least from AI as it’s practiced today. Right now, the practical, on-the-ground experience of researchers trying to use AI to do something new is filled with a lot of pain, suffering, blood, sweat, tears, and perseverance if they are to succeed, and I see that in my lab every day. Most of the researchers – and I have brilliant researchers in my lab that are working very hard, and they’re doing amazing work. And most of the things they try fail. And they have to keep trying. I think that’s generally the case right now across all the people that are working on AI. The thing that’s different is we’ve actually started to see some big successes, along with all of those more frustrating everyday occurrences. So I do think that we’re making the progress, but I think having a master algorithm that’s pushbutton that can solve any problem you pose to it that’s something that’s hard for me to conceive of with today’s state of artificial intelligence.
AI, of course, it’s doubtful we’ll have another AI winter because, like you said, it’s kind of delivering the goods, and there have been three things that have happened that made that possible. One of them is better hardware, and obviously you’re part of that world. The second thing is better algorithms. We’ve learned to do things a lot smarter. And the third thing is we have more data, because we are able to collect it, and store it, and whatnot. Assuming you think the hardware is the biggest of the driving factors, what would you think has been the bigger advance? Is it that we have so much more data, or so much better algorithms?
I think the most important thing is more data. I think the algorithms that we’re using in AI right now are, more or less, clever variations of algorithms that have been around for decades, and used to not work. When I was a PhD student and I was studying AI, all the smart people told me, “Don’t work with deep learning, because it doesn’t work. Use this other algorithm called support vector machines.” Which, at the time, that was the hope that that was going to be the master algorithm. So I stayed away from deep learning back then because, at the time, it didn’t work. I think now we have so much more data, and deep learning models have been so successful at taking advantage of that data, that we’ve been able to make a lot of progress. I wouldn’t characterize deep learning as a master algorithm, though, because deep learning is like a fuzzy cloud of things that have some relationships to each other, but actually finding a space inside that fuzzy cloud to solve a particular problem requires a lot of human ingenuity.
Is there a phrase – it’s such a jargon-loaded industry now – are there any of the words that you just find rub you the wrong way? Because they don’t mean anything and people use them as if they do? Do you have anything like that?
Everybody has pet peeves. I would say that my biggest pet peeve right now is the word neuromorphic. I have almost an allergic reaction every time I hear that word, mostly because I don’t think we know what neurons are or what they do, and I think modeling neurons in a way that actually could lead to brain simulations that actually worked is a very long project that we’re decades away from solving. I could be wrong on that. I’m always waiting for somebody to prove me wrong. Strong opinions, weakly held. But so far, neuromorphic is a word that I just have an allergic reaction to, every time.
Tell me about what you do. You are the head of Applied AI Research at NVIDIA, so what does your day look like? What does your team work on? What’s your biggest challenge right now, and all of that?
NVIDIA sells GPUs which have powered most of the deep learning revolution, so pretty much all of the work that’s going on with deep learning across the entire world right now, runs on NVIDIA GPUs. And that’s been very exciting for NVIDIA, and exciting for me to be involved in building that. The next step, I think, for NVIDIA is to figure out how to use AI to change the way that it does its own work. NVIDIA is incentivized to do this because we see the value that AI is bringing to our customers. Our GPU sales have been going up quite a bit because we’re providing a lot of value to everyone else who’s trying to use AI for their own problems. So the next step is to figure out how to use AI for NVIDIA’s problems directly. Andrew Ng, who I used to work with, has this great quote that “AI is the new electricity,” and I believe that. I think that we’re going to see AI applied in many different ways to many different kinds of problems, and my job at NVIDIA is to figure out how to do that here. So that’s what my team focuses on.
We have projects going on in quite a few different domains, ranging from graphics to audio, and text, and others. We’re trying to change the way that everything at NVIDIA happens: from chip design, to video games, and everything in between. As far as my day-to-day work goes, I lead this team, so that means I spend a lot of time talking with people on the team about the work that they’re doing, and trying to make sure they have the right resources, data, the right hardware, the right ideas, the right connections, so that they can make progress on problems that they’re trying to solve. Then when we have prototypes that we’ve built showing how to apply AI to a particular problem, then I work with people around the company to show them the promise of AI applied to problems that they care about.
I think one of the things that’s really exciting to me about this mission is that we’re really trying to change NVIDIA’s work at the core of the company. So rather than working on applied AI, that could maybe help some peripheral part of the company that maybe could be nice if we did that, we’re actually trying to solve very fundamental problems that the company faces with AI, and hopefully we’ll be able to change the way that the company does business, and transform NVIDIA into an AI company, and not just a company that makes hardware for AI.
You are the head of the Applied AI Research. Is there a Pure AI Research group, as well?
Yes, there is.
So everything you do, you have an internal customer for already?
That’s the idea. To me, the difference between fundamental research and applied research is more a question of emphasis on what’s the fundamental goal of your work. If the goal is academic novelty, that would be fundamental research. Our goal is, we think about applications all the time, and we don’t work on problems unless we have a clear application that we’re trying to build that could use a solution.
In most cases, do other groups come to you and say, “We have this problem we really want to solve. Can you help us?” Or is the science nascent enough that you go and say, “Did you know that we can actually solve this problem for you?”
It kind of works all of those ways. We have a list of projects that people around the company have proposed to us, and we also have a list of projects that we ourselves think are interesting to look at. There’s also a few projects that my management tells me, “I really want you to look at this problem. I think it’s really important.” We get input from all directions, and then prioritize, and go after the ones we think are most feasible, and most important.
And do you find a talent shortage? You’re NVIDIA on the one hand, but on the other hand, you know: it’s AI.
I think the entire field, no matter what company you work at, the entire field has a shortage of qualified scientists that can do AI research, and that’s despite the fact that the amount of people jumping into AI is increasing every year. If you go to any of the academic AI conferences, you’ll see how much energy and how much excitement, and how many people that are there that didn’t used to be there. That’s really wonderful to see. But even with all of that growth and change, it is a big problem for the industry. So, to all of your listeners that are trying to figure out what to do next, come work on AI. We have lots of fun problems to work on, and not nearly enough people doing it.
I know a lot of your projects I’m sure you can’t talk about, but tell me something you have done, that you can talk about, and what the goal was, and what you were able to achieve. Give us a success story.
I’ll give you one that’s relevant to the last question that you asked, which is about how to find talent for AI. We’ve actually built a system that can match candidates to job openings at NVIDIA. Basically, it can predict how well we think a particular candidate is a fit for a particular job. That system is actually performing pretty well. So we’re trialing it with hiring managers around the company to figure out if it can help them be more efficient in their work as they search for people to come join NVIDIA.
That looks like a game, isn’t it? I assume you have a pool of resumes or LinkedIn profiles or whatever, and then you have a pool of successful employees, and you have a pool of job descriptions and you’re trying to say, “How can I pull from that big pool, based on these job descriptions, and actually pick the people that did well in the end?”
That’s right.
That’s like a game, right? You have points.
That’s right.
Would you ever productize anything, or is everything that you’re doing just for your own use?
We focus primarily on building prototypes, not products, in my team. I think that’s what the research is about. Once we build a prototype that shows promise for a particular problem, then we work with other people in the company to get that actually deployed, and they would be the people that think about business strategy about whether something should be productized, or not.
But you, in theory, might turn “NVIDIA Resume Pro” into something people could use?
Possibly. NVIDIA also works with a lot of other companies. As we enable companies in many different parts of the economy to apply AI to their problems, we work with them to help them do that. So it might make more sense for us, for example, to deliver this prototype to some of our partners that are in a position to deliver products like this more directly, and then they can figure out how to enlarge its capabilities, and make it more general to try to solve bigger problems that address their whole market and not just one company’s needs. Partnering with other companies is good for NVIDIA because it helps us grow AI which is something we want to do because, as AI grows, we grow. Personally, I think some of the things that we’re working on; it just doesn’t really make sense. It’s not really in NVIDIA’s DNA to productize them directly because it’s just not the business model that the company has.
I’m sure you’re familiar with the “right to know” legislation in Europe: the idea that if an AI makes a decision about you, you have a right to know why it made that decision. AI researchers are like, “It’s not necessarily that easy to do that.” So in your case, your AI would actually be subject to that. It would say, “Why did you pick that person over this person for that job?” Is that an answerable question?
First of all, I don’t think that this system – or I can’t imagine – using it to actually make hiring decisions. I think that would be irresponsible. This system makes mistakes. What we’re trying to do is improve productivity. If instead of having to sort through 200 resumes to find 3 that I want to talk to—if I can look at 10 instead—then that’s a pretty good improvement in my productivity, but I’m still going to be involved, as a hiring manager, to figure out who is the right fit for my jobs.
But an AI excluded 190 people from that position.
It didn’t exclude them. It sorted them, and then the person decided how to allocate their time in a search.
Let’s look at the problem more abstractly. What do you think, just in general, about the idea that every decision an AI makes, should be, and can be, explained?
I think it’s a little bit utopian. Certainly, I don’t have the ability to explain all of the decisions that I make, and people, generally, are not very good at explaining their decisions, which is why there are significant legal battles going on about factual things, that people see in different ways, and remember in different ways. So asking a person to explain their intent is actually a very complicated thing, and we’re not actually very good at it. So I don’t actually think that we’re going to be able to enforce that AI is able to explain all of its decisions in a way that makes sense to humans. I do think that there are things that we can do to make the results of these systems more interpretable. For example, on the resume job description matching system that I mentioned earlier, we’ve built a prototype that can highlight parts of the resume that were most interesting to the model, both in a positive, and in a negative sense. That’s a baby step towards interpretability so that if you were to pull up that job description and a particular person and you could see how they matched, that might explain to you what the model was paying attention to as it made a ranking.
It’s funny because when you hear reasons why people exclude a resume, I remember one person said, “I’m not going to hire him. He has the same first name as somebody else on the team. That’d just be too confusing.” And somebody else I remember said that the applicant was a vegan and the place they like to order pizza from didn’t have a vegan alternative that the team liked to order from. Those are anecdotal of course, but people use all kinds of other things when they’re thinking about it.
Yeah. That’s actually one of the reasons why I’m excited about this particular system is that I feel like we should be able to construct it in a way that actually has fewer biases than people do, because we know that people harbor all sorts of biases. We have employment laws that guide us to stay away from making decisions based on protected classes. I don’t know if veganism is a protected class, but it’s verging on that. If you’re making hiring decisions based on people’s personal lifestyle choices, that’s suspect. You could get in trouble for that. Our models, we should be able to train them to be more dispassionate than any human could be.
We’re running out of time. Let’s close up by: do you consume science fiction? Do you ever watch movies or read books or any of that? And if so, is there any of it that you look at, especially any that portrays artificial intelligence, like Ex Machina, or Her, or Westworld or any of that stuff, that you look at and you’re like, “Wow, that’s really interesting,” or “That could happen,” or “That’s fascinating,” or anything like that?
I do consume science fiction. I love science fiction. I don’t actually feel like current science fiction matches my understanding of AI very well. Ex Machina, for example, that was a fun movie. I enjoyed watching that movie, but I felt, from a scientific point of view, it just wasn’t very interesting. I was talking about our built-in models of the world. One of the things that humans, over thousands of years, have drilled into our heads is that there’s somebody out to get you. We have a large part of our brain that’s worrying all the time, like, “Who’s going to come kill me tonight? Who’s going to take away my job? Who’s going to take my food? Who’s going to burn down my house?” There’s all these things that we worry about. So a lot of the depictions of AI in science fiction inflame that part of the brain that is worrying about the future, rather than actually speak to the technology and its potential.
I think probably the part of science fiction that has had the most impact on my thoughts about AI is Isaac Asimov’s Three Laws. Those, I think, are pretty classic, and I hope that some of them can be adapted to the kinds of problems that we’re trying to solve with AI, to make AI safe, and make it possible for people to feel confident that they’re interacting with AI, and not worry about it. But I feel like most of science fiction is, especially movies – maybe books can be a little bit more intellectual and maybe a little bit more interesting – but especially movies, it just sells more movies to make people afraid, than it does to show people a mundane existence where AI is helping people live better lives. It’s just not nearly as compelling of a movie, so I don’t actually feel like popular culture treatment of AI is very realistic.
All right. Well, on that note, I say, we wrap up. I want to thank you for a great hour. We covered a lot of ground, and I appreciate you traveling all that way with me.
It was fun.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here
Byron Reese: This is “Voices in AI” brought to you by Gigaom. I’m Byron Reese. Today, our guest is Bryan Catanzaro. He is the head of Applied AI Research at NVIDIA. He has a BS in computer science and Russian from BYU, an MS in electrical engineering from BYU, and a PhD in both electrical engineering and computer science from UC Berkeley. Welcome to the show, Bryan.
Bryan Catanzaro: Thanks. It’s great to be here.
Let’s start off with my favorite opening question. What is artificial intelligence?
It’s such a great question. I like to think about artificial intelligence as making tools that can perform intellectual work. Hopefully, those are useful tools that can help people be more productive in the things that they need to do. There’s a lot of different ways of thinking about artificial intelligence, and maybe the way that I’m talking about it is a little bit more narrow, but I think it’s also a little bit more connected with why artificial intelligence is changing so many companies and so many things about the way that we do things in the world economy today is because it actually is a practical thing that helps people be more productive in their work. We’ve been able to create industrialized societies with a lot of mechanization that help people do physical work. Artificial intelligence is making tools that help people do intellectual work.
I ask you what artificial intelligence is, and you said it’s doing intellectual work. That’s sort of using the word to define it, isn’t it? What is that? What is intelligence?
Yeah, wow…I’m not a philosopher, so I actually don’t have like a…
Let me try a different tact. Is it artificial in the sense that it isn’t really intelligent and it’s just pretending to be, or is it really smart? Is it actually intelligent and we just call it artificial because we built it?
I really liked this idea from Yuval Harari that I read a while back where he said there’s the difference between intelligence and sentience, where intelligence is more about the capacity to do things and sentience is more about being self-aware and being able to reason in the way that human beings reason. My belief is that we’re building increasingly intelligent systems that can perform what I would call intellectual work. Things about understanding data, understanding the world around us that we can measure with sensors like video cameras or audio or that we can write down in text, or record in some form. The process of interpreting that data and making decisions about what it means, that’s intellectual work, and that’s something that we can create machines to be more and more intelligent at. I think the definitions of artificial intelligence that move more towards consciousness and sentience, I think we’re a lot farther away from that as a community. There are definitely people that are super excited about making generally intelligent machines, but I think that’s farther away and I don’t know how to define what general intelligence is well enough to start working on that problem myself. My work focuses mostly on practical things—helping computers understand data and make decisions about it.
Fair enough. I’ll only ask you one more question along those lines. I guess even down in narrow AI, though, if I had a sprinkler that comes on when my grass gets dry, it’s responding to its environment. Is that an AI?
I’d say it’s a very small form of AI. You could have a very smart sprinkler that was better than any person at figuring out when the grass needed to be watered. It could take into account all sorts of sensor data. It could take into account historical information. It might actually be more intelligent at figuring out how to irrigate than a human would be. And that’s a very narrow form of intelligence, but it’s a useful one. So yeah, I do think that could be considered a form of intelligence. Now it’s not philosophizing about the nature of irrigation and its harm on the planet or the history of human interventions on the world, or anything like that. So it’s very narrow, but it’s useful, and it is intelligent in its own way.
Fair enough. I do want to talk about AGI in a little while. I have some questions around…We’ll come to that in just a moment. Just in the narrow AI world, just in your world of using data and computers to solve problems, if somebody said, “Bryan, what is the state-of-the-art? Where are we at in AI? Is this the beginning and you ‘ain’t seen nothing yet’? Or are we really doing a lot of cool things, and we are well underway to mastering that world?”
I think we’re just at the beginning. We’ve seen so much progress over the past few years. It’s been really quite astonishing, the kind of progress we’ve seen in many different domains. It all started out with image recognition and speech recognition, but it’s gone a long way from there. A lot of the products that we interact with on a daily basis over the internet are using AI, and they are providing value to us. They provide our social media feeds, they provide recommendations and maps, they provide conversational interfaces like Siri or Android Assistant. All of those things are powered by AI and they are definitely providing value, but we’re still just at the beginning. There are so many things we don’t know yet how to do and so many underexplored problems to look at. So I believe we’ll continue to see applications of AI come up in new places for quite a while to come.
If I took a little statuette of a falcon, let’s say it’s a foot tall, and I showed it to you, and then I showed you some photographs, and said, “Spot the falcon.” And half the time it’s sticking halfway behind a tree, half the time it’s underwater; one time it’s got peanut butter smeared on it. A person can do that really well, but computers are far away from that. Is that an example of us being really good at transfer learning? We’re used to knowing what things with peanut butter on them look like? What is it that people are doing that computers are having a hard time to do there?
I believe that people have evolved, over a very long period of time, to operate on planet Earth with the sensors that we have. So we have a lot of built-in knowledge that tells us how to process the sensors that we have and models the world. A lot of it is instinctual, and some of it is learned. I have young children, like a year-old or so. They spend an awful lot of time just repetitively probing the world to see how it’s going to react when they do things, like pushing on a string, or a ball, and they do it over and over again because I think they’re trying to build up their models about the world. We have actually very sophisticated models of the world that maybe we take for granted sometimes because everyone seems to get them so easily. It’s not something that you have to learn in school. But these models are actually quite useful, and they’re more sophisticated than – and more general than – the models that we currently can build with today’s AI technology.
To your question about transfer learning, I feel like we’re really good at transfer learning within the domain of things that our eyes can see on planet Earth. There are probably a lot of situations where an AI would be better at transfer learning. Might actually have fewer assumptions baked in about how the world is structured, how objects look, what kind of composition of objects is actually permissible. I guess I’m just trying to say we shouldn’t forget that we come with a lot of context. That’s instinctual, and we use that, and it’s very sophisticated.
Do you take from that that we ought to learn how to embody an AI and just let it wander around the world, bumping into things and poking at them and all of that? Is that what you’re saying? How do we overcome that?
It’s an interesting question you note. I’m not personally working on trying to build artificial general intelligence, but it will be interesting for those people that are working on it to see what kind of childhood is necessary for an AI. I do think that childhood is a really important part of developing human intelligence, and plays a really important part of developing human intelligence because it helps us build and calibrate these models of how the world works, which then we apply to all sorts of things like your question of the falcon statue. Will computers need things like that? It’s possible. We’ll have to see. I think one of the things that’s different about computers is that they’re a lot better at transmitting information identically, so it may be the kind of thing that we can train once, and then just use repeatedly – as opposed to people, where the process of replicating a person is time-consuming and not exact.
But that transfer learning problem isn’t really an AGI problem at all, though. Right? We’ve taught a computer to recognize a cat, by giving it a gazillion images of a cat. But if we want to teach it how to recognize a bird, we have to start over, don’t we?
I don’t think we generally start over. I think most of the time if people wanted to create a new classifier, they would use transfer learning from an existing classifier that had been trained on a wide variety of different object types. It’s actually not very hard to do that, and people do that successfully all the time. So at least for image recognition, I think transfer learning works pretty well. For other kinds of domains, they can be a little bit more challenging. But at least for image recognition, we’ve been able to find a set of higher-level features that are very useful in discriminating between all sorts of different kinds of objects, even objects that we haven’t seen before.
What about audio? Because I’m talking to you now and I’m snapping my fingers. You don’t have any trouble continuing to hear me, but a computer trips over that. What do you think is going on in people’s minds? Why are we good at that, do you think? To get back to your point about we live on Earth, it’s one of those Earth things we do. But as a general rule, how do we teach that to a computer? Is that the same as teaching it to see something, as to teach it to hear something?
I think it’s similar. The best speech recognition accuracies come from systems that have been trained on huge amounts of data, and there does seem to be a relationship that the more data we can train a model on, the better the accuracy gets. We haven’t seen the end of that yet. I’m pretty excited about the prospects of being able to teach computers to continually understand audio, better and better. However, I wanted to point out, humans, this is kind of our superpower: conversation and communication. You watch birds flying in a flock, and the birds can all change direction instantaneously, and the whole flock just moves, and you’re like, “How do you do that and not run into each other?” They have a lot of built-in machinery that allows them to flock together. Humans have a lot of built-in machinery for conversation and for understanding spoken language. The pathways for speaking and the pathways for hearing evolve together, so they’re really well-matched.
With computers trying to understand audio, we haven’t gotten to that point yet. I remember some of the experiments that I’ve done in the past with speech recognition, that the recognition performance was very sensitive to compression artifacts that were actually not audible to humans. We could actually take a recording, like this one, and recompress it in a way that sounded identical to a person, and observe a measurable difference in the recognition accuracy of our model. That was a little disconcerting because we’re trying to train the model to be invariant to all the things that humans are invariant to, but it’s actually quite hard to do that. We certainly haven’t achieved that yet. Often, our models are still what we would call “overfitting”, where they’re paying attention to a lot of details that help it perform the tasks that we’re asking it to perform, but they’re not actually helpful to solving the fundamental tasks that we’re trying to perform. And we’re continually trying to improve our understanding of the tasks that we’re solving so that we can avoid this, but we’ve still got more work to do.
My standard question when I’m put in front of a chatbot or one of the devices that sits on everybody’s desktop, I can’t say them out loud because they’ll start talking to me right now, but the question I always ask is “What is bigger, a nickel or the sun?” To date, nothing has ever been able to answer that question. It doesn’t know how sun is spelled. “Whose son? The sun? Nickel? That’s actually a coin.” All of that. What all do we have to get good at, for the computer to answer that question? Run me down the litany of all the things we can’t do, or that we’re not doing well yet, because there’s no system I’ve ever tried that answered that correctly.
I think one of the things is that we’re typically not building chat systems to answer trivia questions just like that. I think if we were building a special-purpose trivia system for questions like that, we probably could answer it. IBM Watson did pretty well on Jeopardy, because it was trained to answer questions like that. I think we definitely have the databases, the knowledge bases, to answer questions like that. The problem is that kind of a question is really outside of the domain of most of the personal assistants that are being built as products today because honestly, trivia bots are fun, but they’re not as useful as a thing that can set a timer, or check the weather, or play a song. So those are mostly the things that those systems are focused on.
Fair enough, but I would differ. You can go to Wolfram Alpha and say, “What’s bigger, the Statue of Liberty or the Empire State Building?” and it’ll answer that. And you can ask Amazon’s product that same question, and it’ll answer it. Is that because those are legit questions and my question is not legit, or is it because we haven’t taught systems to disintermediate very well and so they don’t really know what I mean when I say “sun”?
I think that’s probably the issue. There’s a language modeling problem when you say, “What’s bigger, a nickel or the sun?” The sun can mean so many different things, like you were saying. Nickel, actually, can be spelled a couple of different ways and has a couple of different meanings. Dealing with ambiguities like that is a little bit hard. I think when you ask that question to me, I categorize this as a trivia question, and so I’m able to disambiguate all of those things, and look up the answer in my little knowledge base in my head, and answer your question. But I actually don’t think that particular question is impossible to solve. I just think it’s just not been a focus to try to solve stuff like that, and that’s why they’re not good.
AIs have done a really good job playing games: Deep Blue, Watson, AlphaGo, and all of that. I guess those are constrained environments with a fixed set of rules, and it’s easy to understand who wins, and what a point is, and all that. What is going to be the next thing, that’s a watershed event, that happens? Now they can outbluff people in poker. What’s something that’s going to be, in a year, or two years, five years down the road, that one day, it wasn’t like that in the universe, and the next day it was? And the next day, the best Go player in the world was a machine.
The thing that’s on my mind for that right now is autonomous vehicles. I think it’s going to change the world forever to unchain people from the driver’s seat. It’s going to give people hugely increased mobility. I have relatives that their doctors have asked them to stop driving cars because it’s no longer safe for them to be doing that, and it restricts their ability to get around the world, and that frustrates them. It’s going to change the way that we all live. It’s going to change the real estate markets, because we won’t have to park our cars in the same places that we’re going to. It’s going to change some things about the economy, because there’s going to be new delivery mechanisms that will become economically viable. I think intelligence that can help robots essentially drive around the roads, that’s the next thing that I’m most excited about, that I think is really going to change everything.
We’ll come to that in just a minute, but I’m actually asking…We have self-driving cars, and on an evolutionary basis, they’ll get a little better and a little better. You’ll see them more and more, and then someday there’ll be even more of them, and then they’ll be this and this and this. It’s not that surprise moment, though, of AlphaGo just beat Lee Sedol at Go. I’m wondering if there is something else like that—that it’s this binary milestone that we can all keep our eye open for?
I don’t know. As far as we have self-driving cars already, I don’t have a self-driving car that could say, for example, let me sit in it at nighttime, go to sleep and wake up, and it brought me to Disneyland. I would like that kind of self-driving car, but that car doesn’t exist yet. I think self-driving trucks that can go cross country carrying stuff, that’s going to radically change the way that we distribute things. I do think that we have, as you said, we’re on the evolutionary path to self-driving cars, but there’s going to be some discrete moments when people actually start using them to do new things that will feel pretty significant.
As far as games and stuff, and computers being better at games than people, it’s funny because I feel like Silicon Valley has, sometimes, a very linear idea of intelligence. That one person is smarter than another person maybe because of an SAT score, or an IQ test, or something. They use that sort of linearity of an intelligence to where some people feel threatened by artificial intelligence because they extrapolate that artificial intelligence is getting smarter and smarter along this linear scale, and that’s going to lead to all sorts of surprising things, like Lee Sedol losing to Go, but on a much bigger scale for all of us. I feel kind of the opposite. Intelligence is such a multidimensional thing. The fact that a computer is better at Go then I am doesn’t really change my life very much, because I’m not very good at Go. I don’t play Go. I don’t consider Go to be an important part of my intelligence. Same with chess. When Gary Kasparov lost to Deep Blue, that didn’t threaten my intelligence. I am sort of defining the way that I work and how I add value to the world, and what things make me happy on a lot of other axes besides “Can I play chess?” or “Can I play Go?” I think that speaks to the idea that intelligence really is very multifaceted. There’s a lot of different kinds – there’s probably thousands or millions of different kinds of intelligence – and it’s not very linearizable.
Because of that, I feel like, as we watch artificial intelligence develop, we’re going to see increasingly more intelligent machines, but they’re going to be increasingly more intelligent in some very narrow domains like “this is the better Go-playing robot than me”, or “this is the better car driver than me”. That’s going to be incredibly useful, but it’s not going to change the way that I think about myself, or about my work, or about what makes me happy. Because I feel like there are so many more dimensions of intelligence that are going to remain the province of humans. That’s going to take a very long time, if ever, for artificial intelligence to become better at all of them than us. Because, as I said, I don’t believe that intelligence is a linearizable thing.
And you said you weren’t a philosopher. I guess the thing that’s interesting to people, is there was a time when information couldn’t travel faster than a horse. And then the train came along, and information could travel. That’s why in the old Westerns – if they ever made it on the train, that was it, and they were out of range. Nothing traveled faster than the train. Then we had a telegraph and, all of a sudden, that was this amazing thing that information could travel at the speed of light. And then one time they ran these cables under the ocean, and somebody in England could talk to somebody in the United States instantly. Each one of them, and I think it’s just an opportunity to pause, and reflect, and to mark a milestone, and to think about what it all means. I think that’s why a computer just beat these awesome poker players. It learned to bluff. You just kind of want to think about it.
So let’s talk about jobs for a moment because you’ve been talking around that for just a second. Just to set the question up: Generally speaking, there are three views of what automation and artificial intelligence are going to do to jobs. One of them reflects kind of what you were saying is that there are going to be a certain group of workers who are considered low skilled, and there are going to be automation that takes these low-skilled jobs, and that there’s going to be a sizable part of the population that’s locked out of the labor market, and it’s kind of like the permanent Great Depression over and over and over forever. Then there’s another view that says, “No, you don’t understand. There’s going to be an inflection point where they can do every single thing. They’re going to be a better conductor and a better painter and a better novelist and a better everything than us. Don’t think that you’ve got something that a machine can’t do.” Clearly, that isn’t your viewpoint from what you said. Then there’s a third viewpoint that says, “No, in the past, even when we had these transformative technologies like electricity and mechanization, people take those technologies and they use them to increase their own productivity and, therefore, their own incomes. And you never have unemployment go up because of them, because people just take it and make a new job with it.” Of those three, or maybe a fourth one I didn’t cover; where do you find yourself?
I feel like I’m closer in spirit to number three. I’m optimistic. I believe that the primary way that we should expect economic growth in the future is by increased productivity. If you buy a house or buy some stock and you want to sell it 20 or 30 years from now, who’s going to buy it, and with what money, and why do you expect the price to go up? I think the answer to that question should be the people in the future should have more money than us because they’re more productive, and that’s why we should expect our world economy to continue growing. Because we find more productivity. I actually feel like this is actually necessary. World productivity growth has been slowing for the past several decades, and I feel like artificial intelligence is our way out of this trap where we have been unable to figure out how to grow our economy because our productivity hasn’t been improving. I actually feel like this is a necessary thing for all of us, is to figure out how to improve productivity, and I think AI is the way that we’re going to do that for the next several decades.
The one thing that I disagreed with in your third statement was this idea that unemployment would never go up. I think nothing is ever that simple. I actually am quite concerned about job displacement in the short-term. I think there will be people that suffer and in fact, I think, to a certain extent, this is already happening. The election of Donald Trump was an eye-opener to me that there really exists a lot of people that feel that they have been left behind by the economy, and they come to very different conclusions about the world than I might. I think that it’s possible that, as we continue to digitize our society, and AI becomes a lever that some people will become very good at using to increase their productivity, that we’re going to see increased inequality and that worries me.
The primary challenges that I’m worried about, for our society, with the rise of AI, have to do more with making sure that we give people purpose and meaning in their life that maybe doesn’t necessarily revolve around punching out a timecard, and showing up to work at 8 o’clock in the morning every day. I want to believe that that future exists. There are a lot of people right now that are brilliant people that have a lot that they could be contributing in many different ways – intellectually, artistically – that are currently not given that opportunity, because they maybe grew up in a place that didn’t have the right opportunities for them to get the right education so that they could apply their skills in that way, and many of them are doing jobs that I think don’t allow them to use their full potential.
So I’m hoping that, as we automate many of those jobs, that more people will be able to find work that provides meaning and purpose to them and allows them to actually use their talents and make the world a better place, but I acknowledge that it’s not going to be an easy transition. I do think that there’s going to be a lot of implications for how our government works and how our economy works, and I hope that we can figure out a way to help defray some of the pain that will happen during this transition.
You talked about two things. You mentioned income inequality as a thing, but then you also said, “I think we’re going to have unemployment from these technologies.” Separating those for a minute and just looking at the unemployment one for a minute, you say things are never that simple. But with the exception of the Great Depression, which nobody believes was caused by technology, unemployment has been between 5% and 10% in this country for 250 years and it only moves between 5% and 10% because of the business cycle, but there aren’t counterexamples. Just imagine if your job was you had animals that performed physical labor. They pulled, and pushed, and all of that. And somebody made the steam engine. That was disruptive. But even when we had that, we had electrification of industry. We adopted steam power. We went from 5% to 85% of our power being generated by steam in just 22 years. And even when you had that kind of disruption, you still didn’t have any increases in unemployment. I’m curious, what is the mechanism, in your mind, by which this time is different?
I think that’s a good point that you raise, and I actually haven’t studied all of those other transitions that our society has gone through. I’d like to believe that it’s not different. That would be a great story if we could all come to agreement, that we won’t see increased unemployment from AI. I think the reason why I’m a little bit worried is that I think this transition in some fields will happen quickly, maybe more quickly than some of the transitions in the past did. Just because, as I was saying, AI is easier to replicate than some other technologies, like electrification of a country. It takes a lot of time to build out physical infrastructure that can actually deliver that. Whereas I think for a lot of AI applications, that infrastructure will be cheaper and quicker to build, so the velocity of the change might be faster and that could lead to a little bit more shock. But it’s an interesting point you raise, and I certainly hope that we can find a way through this transition that is less painful than I’m worried it could be.
Do you worry about misuse of AI? I’m an optimist on all of this. And I know that every time we have some new technology come along, people are always looking at the bad cases. You take something like the internet, and the internet has overwhelmingly been a force for good. It connects people in a profound way. There’s a million things. And yeah, some people abuse it. But on net, all technology, I believe, almost all technology on net is used for good because I think, on net, people, on average, are more inclined to build than to destroy. That being said, do you worry about nefarious uses of AI, specifically in warfare?
Yeah. I think that there definitely are going to be some scary killer robots that armies make. Armies love to build machinery that kills things and AI will help them do that, and that will be scary. I think it’s interesting, like, where is the real threat going to come from? Sometimes, I feel like the threat of malevolent AI being deployed against people is going to be more subtle than that. It’s going to be more about things that you can do after compromising fiber systems of some adversary, and things that you can do to manipulate them using AI. There’s been a lot of discussion about Russian involvement in the 2016 election in the US, and that wasn’t about sending evil killer robots. It was more about changing people’s opinions, or attempting to change their opinions, and AI will give entities tools to do that on a scale that maybe we haven’t seen before. I think there may be nefarious uses of AI that are more subtle and harder to see than a full-frontal assault from a movie with evil killer robots. I do worry about all of those things, but I also share your optimism. I think we humans, we make lots of mistakes and we shouldn’t give ourselves too easy of a time here. We should learn from those mistakes, but we also do a lot of things well. And we have used technologies in the past to make the world better, and I hope AI will do so as well.
Pedro Domingo wrote a book called The Master Algorithm where he says there are all of these different tools and techniques that we use in artificial intelligence. And he surmises that there is probably a grandparent algorithm, the master algorithm, that can solve any problem, any range of problems. Does that seem possible to you or likely, or do you have any thoughts on that?
I think it’s a little bit far away, at least from AI as it’s practiced today. Right now, the practical, on-the-ground experience of researchers trying to use AI to do something new is filled with a lot of pain, suffering, blood, sweat, tears, and perseverance if they are to succeed, and I see that in my lab every day. Most of the researchers – and I have brilliant researchers in my lab that are working very hard, and they’re doing amazing work. And most of the things they try fail. And they have to keep trying. I think that’s generally the case right now across all the people that are working on AI. The thing that’s different is we’ve actually started to see some big successes, along with all of those more frustrating everyday occurrences. So I do think that we’re making the progress, but I think having a master algorithm that’s pushbutton that can solve any problem you pose to it that’s something that’s hard for me to conceive of with today’s state of artificial intelligence.
AI, of course, it’s doubtful we’ll have another AI winter because, like you said, it’s kind of delivering the goods, and there have been three things that have happened that made that possible. One of them is better hardware, and obviously you’re part of that world. The second thing is better algorithms. We’ve learned to do things a lot smarter. And the third thing is we have more data, because we are able to collect it, and store it, and whatnot. Assuming you think the hardware is the biggest of the driving factors, what would you think has been the bigger advance? Is it that we have so much more data, or so much better algorithms?
I think the most important thing is more data. I think the algorithms that we’re using in AI right now are, more or less, clever variations of algorithms that have been around for decades, and used to not work. When I was a PhD student and I was studying AI, all the smart people told me, “Don’t work with deep learning, because it doesn’t work. Use this other algorithm called support vector machines.” Which, at the time, that was the hope that that was going to be the master algorithm. So I stayed away from deep learning back then because, at the time, it didn’t work. I think now we have so much more data, and deep learning models have been so successful at taking advantage of that data, that we’ve been able to make a lot of progress. I wouldn’t characterize deep learning as a master algorithm, though, because deep learning is like a fuzzy cloud of things that have some relationships to each other, but actually finding a space inside that fuzzy cloud to solve a particular problem requires a lot of human ingenuity.
Is there a phrase – it’s such a jargon-loaded industry now – are there any of the words that you just find rub you the wrong way? Because they don’t mean anything and people use them as if they do? Do you have anything like that?
Everybody has pet peeves. I would say that my biggest pet peeve right now is the word neuromorphic. I have almost an allergic reaction every time I hear that word, mostly because I don’t think we know what neurons are or what they do, and I think modeling neurons in a way that actually could lead to brain simulations that actually worked is a very long project that we’re decades away from solving. I could be wrong on that. I’m always waiting for somebody to prove me wrong. Strong opinions, weakly held. But so far, neuromorphic is a word that I just have an allergic reaction to, every time.
Tell me about what you do. You are the head of Applied AI Research at NVIDIA, so what does your day look like? What does your team work on? What’s your biggest challenge right now, and all of that?
NVIDIA sells GPUs which have powered most of the deep learning revolution, so pretty much all of the work that’s going on with deep learning across the entire world right now, runs on NVIDIA GPUs. And that’s been very exciting for NVIDIA, and exciting for me to be involved in building that. The next step, I think, for NVIDIA is to figure out how to use AI to change the way that it does its own work. NVIDIA is incentivized to do this because we see the value that AI is bringing to our customers. Our GPU sales have been going up quite a bit because we’re providing a lot of value to everyone else who’s trying to use AI for their own problems. So the next step is to figure out how to use AI for NVIDIA’s problems directly. Andrew Ng, who I used to work with, has this great quote that “AI is the new electricity,” and I believe that. I think that we’re going to see AI applied in many different ways to many different kinds of problems, and my job at NVIDIA is to figure out how to do that here. So that’s what my team focuses on.
We have projects going on in quite a few different domains, ranging from graphics to audio, and text, and others. We’re trying to change the way that everything at NVIDIA happens: from chip design, to video games, and everything in between. As far as my day-to-day work goes, I lead this team, so that means I spend a lot of time talking with people on the team about the work that they’re doing, and trying to make sure they have the right resources, data, the right hardware, the right ideas, the right connections, so that they can make progress on problems that they’re trying to solve. Then when we have prototypes that we’ve built showing how to apply AI to a particular problem, then I work with people around the company to show them the promise of AI applied to problems that they care about.
I think one of the things that’s really exciting to me about this mission is that we’re really trying to change NVIDIA’s work at the core of the company. So rather than working on applied AI, that could maybe help some peripheral part of the company that maybe could be nice if we did that, we’re actually trying to solve very fundamental problems that the company faces with AI, and hopefully we’ll be able to change the way that the company does business, and transform NVIDIA into an AI company, and not just a company that makes hardware for AI.
You are the head of the Applied AI Research. Is there a Pure AI Research group, as well?
Yes, there is.
So everything you do, you have an internal customer for already?
That’s the idea. To me, the difference between fundamental research and applied research is more a question of emphasis on what’s the fundamental goal of your work. If the goal is academic novelty, that would be fundamental research. Our goal is, we think about applications all the time, and we don’t work on problems unless we have a clear application that we’re trying to build that could use a solution.
In most cases, do other groups come to you and say, “We have this problem we really want to solve. Can you help us?” Or is the science nascent enough that you go and say, “Did you know that we can actually solve this problem for you?”
It kind of works all of those ways. We have a list of projects that people around the company have proposed to us, and we also have a list of projects that we ourselves think are interesting to look at. There’s also a few projects that my management tells me, “I really want you to look at this problem. I think it’s really important.” We get input from all directions, and then prioritize, and go after the ones we think are most feasible, and most important.
And do you find a talent shortage? You’re NVIDIA on the one hand, but on the other hand, you know: it’s AI.
I think the entire field, no matter what company you work at, the entire field has a shortage of qualified scientists that can do AI research, and that’s despite the fact that the amount of people jumping into AI is increasing every year. If you go to any of the academic AI conferences, you’ll see how much energy and how much excitement, and how many people that are there that didn’t used to be there. That’s really wonderful to see. But even with all of that growth and change, it is a big problem for the industry. So, to all of your listeners that are trying to figure out what to do next, come work on AI. We have lots of fun problems to work on, and not nearly enough people doing it.
I know a lot of your projects I’m sure you can’t talk about, but tell me something you have done, that you can talk about, and what the goal was, and what you were able to achieve. Give us a success story.
I’ll give you one that’s relevant to the last question that you asked, which is about how to find talent for AI. We’ve actually built a system that can match candidates to job openings at NVIDIA. Basically, it can predict how well we think a particular candidate is a fit for a particular job. That system is actually performing pretty well. So we’re trialing it with hiring managers around the company to figure out if it can help them be more efficient in their work as they search for people to come join NVIDIA.
That looks like a game, isn’t it? I assume you have a pool of resumes or LinkedIn profiles or whatever, and then you have a pool of successful employees, and you have a pool of job descriptions and you’re trying to say, “How can I pull from that big pool, based on these job descriptions, and actually pick the people that did well in the end?”
That’s right.
That’s like a game, right? You have points.
That’s right.
Would you ever productize anything, or is everything that you’re doing just for your own use?
We focus primarily on building prototypes, not products, in my team. I think that’s what the research is about. Once we build a prototype that shows promise for a particular problem, then we work with other people in the company to get that actually deployed, and they would be the people that think about business strategy about whether something should be productized, or not.
But you, in theory, might turn “NVIDIA Resume Pro” into something people could use?
Possibly. NVIDIA also works with a lot of other companies. As we enable companies in many different parts of the economy to apply AI to their problems, we work with them to help them do that. So it might make more sense for us, for example, to deliver this prototype to some of our partners that are in a position to deliver products like this more directly, and then they can figure out how to enlarge its capabilities, and make it more general to try to solve bigger problems that address their whole market and not just one company’s needs. Partnering with other companies is good for NVIDIA because it helps us grow AI which is something we want to do because, as AI grows, we grow. Personally, I think some of the things that we’re working on; it just doesn’t really make sense. It’s not really in NVIDIA’s DNA to productize them directly because it’s just not the business model that the company has.
I’m sure you’re familiar with the “right to know” legislation in Europe: the idea that if an AI makes a decision about you, you have a right to know why it made that decision. AI researchers are like, “It’s not necessarily that easy to do that.” So in your case, your AI would actually be subject to that. It would say, “Why did you pick that person over this person for that job?” Is that an answerable question?
First of all, I don’t think that this system – or I can’t imagine – using it to actually make hiring decisions. I think that would be irresponsible. This system makes mistakes. What we’re trying to do is improve productivity. If instead of having to sort through 200 resumes to find 3 that I want to talk to—if I can look at 10 instead—then that’s a pretty good improvement in my productivity, but I’m still going to be involved, as a hiring manager, to figure out who is the right fit for my jobs.
But an AI excluded 190 people from that position.
It didn’t exclude them. It sorted them, and then the person decided how to allocate their time in a search.
Let’s look at the problem more abstractly. What do you think, just in general, about the idea that every decision an AI makes, should be, and can be, explained?
I think it’s a little bit utopian. Certainly, I don’t have the ability to explain all of the decisions that I make, and people, generally, are not very good at explaining their decisions, which is why there are significant legal battles going on about factual things, that people see in different ways, and remember in different ways. So asking a person to explain their intent is actually a very complicated thing, and we’re not actually very good at it. So I don’t actually think that we’re going to be able to enforce that AI is able to explain all of its decisions in a way that makes sense to humans. I do think that there are things that we can do to make the results of these systems more interpretable. For example, on the resume job description matching system that I mentioned earlier, we’ve built a prototype that can highlight parts of the resume that were most interesting to the model, both in a positive, and in a negative sense. That’s a baby step towards interpretability so that if you were to pull up that job description and a particular person and you could see how they matched, that might explain to you what the model was paying attention to as it made a ranking.
It’s funny because when you hear reasons why people exclude a resume, I remember one person said, “I’m not going to hire him. He has the same first name as somebody else on the team. That’d just be too confusing.” And somebody else I remember said that the applicant was a vegan and the place they like to order pizza from didn’t have a vegan alternative that the team liked to order from. Those are anecdotal of course, but people use all kinds of other things when they’re thinking about it.
Yeah. That’s actually one of the reasons why I’m excited about this particular system is that I feel like we should be able to construct it in a way that actually has fewer biases than people do, because we know that people harbor all sorts of biases. We have employment laws that guide us to stay away from making decisions based on protected classes. I don’t know if veganism is a protected class, but it’s verging on that. If you’re making hiring decisions based on people’s personal lifestyle choices, that’s suspect. You could get in trouble for that. Our models, we should be able to train them to be more dispassionate than any human could be.
We’re running out of time. Let’s close up by: do you consume science fiction? Do you ever watch movies or read books or any of that? And if so, is there any of it that you look at, especially any that portrays artificial intelligence, like Ex Machina, or Her, or Westworld or any of that stuff, that you look at and you’re like, “Wow, that’s really interesting,” or “That could happen,” or “That’s fascinating,” or anything like that?
I do consume science fiction. I love science fiction. I don’t actually feel like current science fiction matches my understanding of AI very well. Ex Machina, for example, that was a fun movie. I enjoyed watching that movie, but I felt, from a scientific point of view, it just wasn’t very interesting. I was talking about our built-in models of the world. One of the things that humans, over thousands of years, have drilled into our heads is that there’s somebody out to get you. We have a large part of our brain that’s worrying all the time, like, “Who’s going to come kill me tonight? Who’s going to take away my job? Who’s going to take my food? Who’s going to burn down my house?” There’s all these things that we worry about. So a lot of the depictions of AI in science fiction inflame that part of the brain that is worrying about the future, rather than actually speak to the technology and its potential.
I think probably the part of science fiction that has had the most impact on my thoughts about AI is Isaac Asimov’s Three Laws. Those, I think, are pretty classic, and I hope that some of them can be adapted to the kinds of problems that we’re trying to solve with AI, to make AI safe, and make it possible for people to feel confident that they’re interacting with AI, and not worry about it. But I feel like most of science fiction is, especially movies – maybe books can be a little bit more intellectual and maybe a little bit more interesting – but especially movies, it just sells more movies to make people afraid, than it does to show people a mundane existence where AI is helping people live better lives. It’s just not nearly as compelling of a movie, so I don’t actually feel like popular culture treatment of AI is very realistic.
All right. Well, on that note, I say, we wrap up. I want to thank you for a great hour. We covered a lot of ground, and I appreciate you traveling all that way with me.
It was fun.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here
[voices_in_ai_link_back]

Watson-powered toy blows past Kickstarter goal in a day

First it was Jeopardy!, then it was cancer, e-commerce and cooking. Now, IBM’s Watson artificial intelligence system is powering a line of connected toys.

And it looks as if people are impressed with the idea: A company called Elemental Path launched a Kickstarter campaign on Monday for a line of toy dinosaurs, called CogniToys, and had surpassed its initial goal as of Tuesday morning. The company was aiming for $50,000 and had raised more than $70,000 as of 11:40 a.m. Tuesday.

Essentially, the dinosaurs are connected toys that speak to IBM’s Watson cloud APIs, which the company began rolling out last year. According to the Kickstarter page, the CogniToys will allow children to engage with them by talking — asking question, telling jokes, sharing stories and the like. In addition, the page states, “The technology allows toys to listen, speak and simultaneously evolve, learn and grow with your child; bringing a new element of personalized, educational play to children.”

cognitoys2

Elemental Path is not the first company focused on building natural language and artificial intelligence into toys. Possibly the best-known example so far is a startup called ToyTalk, which is building natural language iPad apps and was founded by former Pixar CTO Oren Jacob.

The evolution of artificial intelligence, and the ability to easily train toys, robots, apps or anything, really, is going to be a major focus of Gigaom’s Structure Intelligence conference September 22–23 in San Francisco. We’ll also talk a lot about machine learning and AI at our Structure Data conference March 18–19 in New York, where speakers from Facebook, Yahoo, Spotify and elsewhere will discuss how data in the form of images, text, and even sounds are allowing them to build new products and discover new insights about their users.

New to deep learning? Here are 4 easy lessons from Google

Google employs some of the world’s smartest researchers in deep learning and artificial intelligence, so it’s not a bad idea to listen to what they have to say about the space. One of those researchers, senior research scientist Greg Corrado, spoke at RE:WORK’s Deep Learning Summit on Thursday in San Francisco and gave some advice on when, why and how to use deep learning.

His talk was pragmatic and potentially very useful for folks who have heard about deep learning and how great it is — well, at computer vision, language understanding and speech recognition, at least — and are now wondering whether they should try using it for something. The TL;DR version is “maybe,” but here’s a little more nuanced advice from Corrado’s talk.

(And, of course, if you want to learn even more about deep learning, you can attend Gigaom’s Structure Data conference in March and our inaugural Structure Intelligence conference in September. You can also watch the presentations from our Future of AI meetup, which was held in late 2014.)

1. It’s not always necessary, even if it would work

Probably the most-useful piece of advice Corrado gave is that deep learning isn’t necessarily the best approach to solving a problem, even if it would offer the best results. Presently, it’s computationally expensive (in all meanings of the word), it often requires a lot of data (more on that later) and probably requires some in-house expertise if you’re building systems yourself.

So while deep learning might ultimately work well on pattern-recognition tasks on structured data — fraud detection, stock-market prediction or analyzing sales pipelines, for example — Corrado said it’s easier to justify in the areas where it’s already widely used. “In machine perception, deep learning is so much better than the second-best approach that it’s hard to argue with,” he explained, while the gap between deep learning and other options is not so great in other applications.

That being said, I found myself in multiple conversations at the event centered around the opportunity to soup up existing enterprise software markets with deep learning and met a few startups trying to do it. In an on-stage interview I did with Baidu’s Andrew Ng (who worked alongside Corrado on the Google Brain project) earlier in the day, he noted how deep learning is currently powering some ad serving at Baidu and suggested that data center operations (something Google is actually exploring) might be a good fit.

Greg Corrado

Greg Corrado

2. You don’t have to be Google to do it

Even when companies do decide to take on deep learning work, they don’t need to aim for systems as big as those at Google or Facebook or Baidu, Corrado said. “The answer is definitely not,” he reiterated. “. . . You only need an engine big enough for the rocket fuel available.”

The rocket analogy is a reference to something Ng said in our interview, explaining the tight relationship between systems design and data volume in deep learning environments. Corrado explained that Google needs a huge system because it’s working with huge volumes of data and needs to be able to move quickly as its research evolves. But if you know what you want to do or don’t have major time constraints, he said, smaller systems could work just fine.

For getting started, he added later, a desktop computer could actually work provided it has a sufficiently capable GPU.

3. But you probably need a lot of data

However, Corrado cautioned, it’s no joke that training deep learning models really does take a lot of data. Ideally as much as you can get yours hands on. If he’s advising executives on when they should consider deep learning, it pretty much comes down to (a) whether they’re trying to solve a machine perception problem and/or (b) whether they have “a mountain of data.”

If they don’t have a mountain of data, he might suggest they get one. At least 100 trainable observations per feature you want to train is a good start, he said, adding that it’s conceivable to waste months of effort trying to optimize a model that would have been solved a lot quicker if you had just spent some time gathering training data early on.

Corrado said he views his job not as building intelligent computers (artificial intelligence) or building computers that can learn (machine learning), but as building computers that can learn to be intelligent. And, he said, “You have to have a lot of data in order for that to work.”

Source: Google

Training a system that can do this takes a lot of data.

4. It’s not really based on the brain

Corrado received his Ph.D. in neuroscience and worked on IBM’s SyNAPSE neurosynaptic chip before coming to Google, and says he feels confident in saying that deep learning is only loosely based on how the brain works. And that’s based on what little we know about the brain to begin with.

Earlier in the day, Ng said about the same thing. To drive the point home, he noted that while many researchers believe we learn in an unsupervised manner, most production deep learning models today are still trained in a supervised manner. That is, they analyze lots of labeled images, speech samples or whatever in order to learn what it is.

And comparisons to the brain, while easier than nuanced explanations, tend to lead to overinflated connotations about what deep learning is or might be capable of. “This analogy,” Corrado said, “is now officially overhyped.”

Update: This post was updated on Feb. 2 to correct a statement about Corrado’s tenure at Google. He was with the company before Andrew Ng and the Google Brain project, and was not recruited by Ng to work on it, as originally reported.

Facebook acquires speech-recognition IoT startup Wit.AI

Facebook has acquired Wit.AI, a San Francisco-based startup building a speech-recognition platform for the internet of things. The company launched early in 2014 raised a $3 million seed round in October from a group of investors including Andreessen Horowitz, Ignition Partners and NEA.

Wit.AI has about 6,000 developers on its platform, which allows users to program speech-recognition controls into their devices and deliver the capabilities via API. When I spoke with co-founder Alex Lebrun in May, he explained that his ultimate goal is to power artificially intelligent personalities like those in the move Her, but the company’s present focus is on helping power devices that can respond to simple voice commands. At the time, he said, Wit.AI was working with SmartThings on its line of connected devices, and was in talks with Nest before the Google acquisition.

Here’s how Wit.AI characterized its decision to join Facebook in a blog post announcing the deal:

[blockquote person=”” attribution=””]Facebook has the resources and talent to help us take the next step. Facebook’s mission is to connect everyone and build amazing experiences for the over 1.3 billion people on the platform – technology that understands natural language is a big part of that, and we think we can help.

The platform will remain open and become entirely free for everyone.[/blockquote]

For Facebook, acquiring Wit.AI gives it another opportunity to expand its platform into the world of connected devices and even smart homes without relying on speech-recognition technology developed by often-competitive companies. Much like Amazon has its Echo device, and Google has both the Android ecosystem and the Nest division, Facebook, too, likely wants a way to let users touch it when neither a keyboard nor a conventional computing device is around.

And, as a reader pointed out to me on Twitter, that probably happens a lot more in the developing world where Facebook expects to grow a lot in the years to come.

AI startup Expect Labs raises $13M as voice search API takes off

There’s more to speech recognition apps than Siri, Cortana or Google voice search, and a San Francisco startup called Expect Labs aims to prove it. On Thursday, the company announced it has raised a $13 million Series A round of venture capital led by IDG Ventures and USAA, with participation from strategic investors including Samsung, Intel and Telefonica. The company has now raised $15.5 million since launching in late 2012.

Expect Labs started out by building an application called MindMeld that lets users carry on voice conversations and automatically surfaces related content from around the web as they speak. However, that was just a proving ground for what is now the company’s primary business — its MindMeld API. The company released the API in February 2014, and has since rolled out specific modules for media and ecommerce recommendations.

Here’s how the API works, as I described at its launch:

[blockquote person=”” attribution=””]The key to the MindMeld API is its ability (well, the ability of the system behind it) to account for context. The API will index and make a knowledge graph from a website, database or content collection, but then it also collects contextual clues from an application’s users about where they are, what they’re doing or what they’re typing, for example. It’s that context that lets the API decide which search results to display or content to recommend, and when.[/blockquote]

Tim Tuttle (left) at Structure Data 2014.

Tim Tuttle (left) at Structure Data 2014.

API users don’t actually have to incorporate speech recognition into their apps, and initially many didn’t, but that’s starting to change, said Expect Labs co-founder and CEO Tim Tuttle. There are about a thousand developers building on the API right now, and the vast improvements in speech recognition over the past several months alone has helped pique their interest in voice.

Around the second quarter of next year, he said, “You’re going to see some very cool, very accurate voice apps start to appear.”

He doesn’t think every application is ideal for a voice interface, but he does think it’s ideal for those situations where people need to sort through a large number of choices. “If you get voice right … it can actually be much, much faster to help users find what they need,” he explained, because it’s easier and faster to refine searches when you don’t have to think about what to type and actually type it.

A demo of MindMeld voice search, in which I learned Loren Avedon plays a kickboxer in more than one movie.

A demo of MindMeld voice search, in which I learned Loren Avedon plays a kickboxer in more than one movie.

Of course, that type of experience requires more than just speech recognition, it also requires the natural language processing and indexing capabilities that are Expect Labs’ bread and butter. Tuttle cited some big breakthroughs in those areas over the past couple of years, as well, and said one of his company’s big challenges is keeping up with those advances as they scale from words up to paragraphs of text. It needs to understand the state of the art, and also be able to hone in the sweet spot for voice interfaces that probably lies somewhere between them.

“People are still trying to figure out what the logical unit of the human brain is and replicate that,” he said.

Check out Tuttle’s session at Structure Data 2014 below. Structure Data 2015 takes place March 18-19 in New York, and covers all things data, from Hadoop to quantum computing, and from BuzzFeed to crime prediction.

[youtube=http://www.youtube.com/watch?v=5qcAOkNOX5c&w=640&h=390]

Baidu claims deep learning breakthrough with Deep Speech

Chinese search engine giant Baidu says it has developed a speech recognition system, called Deep Speech, the likes of which has never been seen, especially in noisy environments. In restaurant settings and other loud places where other commercial speech recognition systems fail, the deep learning model proved accurate nearly 81 percent of the time.

That might not sound too great, but consider the alternative: commercial speech-recognition APIs against which Deep Speech was tested, including those for [company]Microsoft[/company] Bing, [company]Google[/company] and Wit.AI, topped out at nearly 65 percent accuracy in noisy environments. Those results probably underestimate the difference in accuracy, said [company]Baidu[/company] Chief Scientist Andrew Ng, who worked on Deep Speech along with colleagues at the company’s artificial intelligence lab in Palo Alto, California, because his team could only compare accuracy where the other systems all returned results rather than empty strings.

baidu1

Ng said that while the research is still just research for now, Baidu is definitely considering integrating it into its speech-recognition software for smartphones and connected devices such as Baidu Eye. The company is also working on an Amazon Echo-like home appliance called CoolBox, and even a smart bike.

“Some of the applications we already know about would be much more awesome if speech worked in noisy environments,” Ng said.

Deep Speech also outperformed, by about 9 percent, top academic speech-recognition models on a popular dataset called Hub5’00. The system is based on a type of recurrent neural network, which are often used for speech recognition and text analysis. Ng credits much of the success to Baidu’s massive GPU-based deep learning infrastructure, as well as to the novel way them team built up a training set of 100,000 hours of speech data on which to train the system on noisy situations.

Baidu gathered about 7,000 hours of data on people speaking conversationally, and then synthesized a total of roughly 100,000 hours by fusing those files with files containing background noise. That was noise from a restaurant, a television, a cafeteria, and the inside of a car and a train. By contrast, the Hub5’00 dataset includes a total of 2,300 hours.

“This is a vast amount of data,” said Ng. ” … Most systems wouldn’t know what to do with that much speech data.”

Another big improvement, he said, came from using an end-to-end deep learning model on that huge dataset rather than using a standard, and computationally expensive, type of acoustic model. Traditional approaches will break recognition down into multiple steps, including one called speaker adaption, Ng explained, but “we just feed our algorithm a lot of data” and rely on it to learn everything it needs to. Accuracy aside, the Baidu approach also resulted in a dramatically reduced code base, he added.

You can hear Ng talk more about Baidu’s work in deep learning in this Gigaom Future of AI talk embedded below. That event also included a talk from Google speech recognition engineer Johan Schalkwyk. Deep learning will also play a prominent role at our upcoming Structure Data conference, where speakers from [company]Facebook[/company], [company]Yahoo[/company] and elsewhere will discuss how they do it and how it impacts their businesses.

[youtube=http://www.youtube.com/watch?v=MPiUQrJ9tGc&w=640&h=360]

Speech-recognition platform Wit.AI raises a $3M seed round

Wit.AI, a startup building an API platform for speech recognition, has raised a $3 million seed round, led by Andreessen Horowitz. Ignition Partners, NEA, A-Grade, SVAngel, Eric Hahn, Alven Capital, and TenOneTen also contributed. We covered Wit.AI in May, detailing its plans to build a machine-learning-powered API service that developers can use to bring voice commands to their applications or connected devices.  We will, of course, be talking all about the devices that could benefit from such a platform at our Structure Connect conference next week in San Francisco.