Voices in AI – Episode 72: A Conversation with Irving Wladawsky-Berger


About this Episode

Episode 72 of Voices in AI features host Byron Reese and Irving Wladawsky-Berger discuss the complexity of the human brain, the possibility of AGI and its origins, the implications of AI in weapons, and where else AI has and could take us. Irving has a PhD in Physics from the University of Chicago, is a research affiliate with the MIT Sloan School of Management, he is a guest columnist for the Wall Street Journal and CIO Journal, he is an agent professor of the Imperial College of London, and he is a fellow for the Center for Global Enterprise.
Visit www.VoicesinAI.com to listen to this one-hour podcast or read the full transcript.

Transcript Excerpt

Byron Reese: This is Voices in AI, brought to you by GigaOm, and I’m Byron Reese. Today our guest is Irving Wladawsky-Berger. He is a bunch of things. He is a research affiliate with the MIT Sloan School of Management. He is a guest columnist for the Wall Street Journal and CIO Journal. He is an adjunct professor of the Imperial College of London. He is a fellow for the Center for Global Enterprise, and I think a whole lot more things. Welcome to the show, Irving.
Irving Wladawsky-Berger: Byron it’s a pleasure to be here with you.
So, that’s a lot of things you do. What do you spend most of your time doing?
Well, I spend most of my time these days either in MIT-oriented activities or writing my weekly columns, [which] take quite a bit of time. So, those two are a combination, and then, of course, doing activities like this – talking to you about AI and related topics.
So, you have an M.S. and a Ph.D. in Physics from the University of Chicago. Tell me… how does artificial intelligence play into the stuff you do on a regular basis?
Well, first of all, I got my Ph.D. in Physics in Chicago in 1970. I then joined IBM research in Computer Science. I switched fields from Physics to Computer Science because as I was getting my degree in the ‘60s, I spent most of my time computing.
And then you spent 37 years at IBM, right?
Yeah, then I spent 37 years at IBM working full time, and another three and a half years as a consultant. So, I joined IBM research in 1970, and then about four years later my first management job was to organize an AI group. Now, Byron, AI in 1974 was very very very different from AI in 2018. I’m sure you’re familiar with the whole history of AI. If not, I can just briefly tell you about the evolution. I’ve seen it, having been involved with it in one way or another for all these years.
So, back then did you ever have occasion to meet [John] McCarthy or any of the people at the Dartmouth [Summer Research Project]?
Yeah, yeah.
So, tell me about that. Tell me about the early early days in AI, before we jump into today.
I knew people at the MIT AI lab… Marvin Minsky, McCarthy, and there were a number of other people. You know, what’s interesting is at the time the approach to AI was to try to program intelligence, writing it in Lisp, which John McCarthy invented as a special programming language; writing in rules-based languages; writing in Prolog. At the time – remember this was years ago – they all thought that you could get AI done that way and it was just a matter of time before computers got fast enough for this to work. Clearly that approach toward artificial intelligence didn’t work at all. You couldn’t program something like intelligence when we didn’t understand at all how it worked…
Well, to pause right there for just a second… The reason they believed that – and it was a reasonable assumption – the reason they believed it is because they looked at things like Isaac Newton coming up with three laws that covered planetary motion, and Maxwell and different physical systems that only were governed by two or three simple laws and they hoped intelligence was. Do you think there’s any aspect of intelligence that’s really simple and we just haven’t stumbled across it, that you just iterate something over and over again? Any aspect of intelligence that’s like that?
I don’t think so, and in fact my analogy… and I’m glad you brought up Isaac Newton. This goes back to physics, which is what I got my degrees in. This is like comparing classical mechanics, which is deterministic. You know, you can tell precisely, based on classical mechanics, the motion of planets. If you throw a baseball, where is it going to go, etc. And as we know, classical mechanics does not work at the atomic and subatomic level.
We have something called quantum mechanics, and in quantum mechanics, nothing is deterministic. You can only tell what things are going to do based on something called a wave function, which gives you probability. I really believe that AI is like that, that it is so complicated, so emergent, so chaotic; etc., that the way to deal with AI is in a more probabilistic way. That has worked extremely well, and the previous approach where we try to write things down in a sort of deterministic way like classical mechanics, that just didn’t work.
Byron, imagine if I asked you to write down specifically how you learned to ride a bicycle. I bet you won’t be able to do it. I mean, you can write a poem about it. But if I say, “No, no, I want a computer program that tells me precisely…” If I say, “Byron I know you know how to recognize a cat. Tell me how you do it.” I don’t think you’ll be able to tell me, and that’s why that approach didn’t work.
And then, lo and behold, in the ‘90s we discovered that there was a whole different approach to AI based on getting lots and lots of data in very fast computers, analyzing the data, and then something like intelligence starts coming out of all that. I don’t know if it’s intelligence, but it doesn’t matter.
I really think that to a lot of people the real point where that hit home is when in the late ‘90s, IBM’s Deep Blue supercomputer, beat Garry Kasparov in a very famous [chess]match. I don’t know, Byron, if you remember that.
Listen to this one-hour episode or read the full transcript at www.VoicesinAI.com
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.

Voices in AI – Episode 70: A Conversation with Jakob Uszkoreit


About this Episode

Episode 70 of Voices in AI features host Byron Reese and Jakob Uszkoreit discuss machine learning, deep learning, AGI, and what this could mean for the future of humanity. Jakob has a masters degree in Computer Science and Mathematics from Technische Universität Berlin. Jakob has also worked at Google for the past 10 years currently in deep learning research with Google Brain.
Visit www.VoicesinAI.com to listen to this one-hour podcast or read the full transcript.

Transcript Excerpt

Byron Reese: This is Voices in AI, brought to you by GigaOm. I’m Byron Reese. Today our guest is Jakob Uszkoreit, he is a researcher at Google Brain, and that’s kind of all you have to say at this point. Welcome to the show, Jakob.
Let’s start with my standard question which is: What is artificial intelligence, and what is intelligence, if you want to start there, and why is it artificial?
Jakob Uszkoreit: Hi, thanks for having me. Let’s start with artificial intelligence specifically. I don’t think I’m necessarily the best person to answer the question what intelligence is in general, but I think for artificial intelligence, there’s possibly two different kind of ideas that we might be referring to with that phrase.
One is kind of the scientific or the group of directions of scientific research, including things like machine learning, but also other related disciplines that people commonly refer to with the term ‘artificial intelligence.’ But I think there’s this other maybe more important use of the phrase that has become much more common in this age of the rise of AI if you want to call it that, and that is what society interprets that term to mean. I think largely what society might think when they hear the term artificial intelligence, is actually automation, in a very general way, and maybe more specifically, automation where the process of automating [something] requires the machine or the machines doing so to make decisions that are highly dynamic in response to their environment and in our ideas or in our conceptualization of those processes, require something like human intelligence.
So, I really think it’s actually something that doesn’t necessarily, in the eyes of the public, have that much to do with intelligence, per se. It’s more the idea of automating things that at least so far, only humans could do, and the hypothesized reason for that is that only humans possess this ephemeral thing of intelligence.
Do you think it’s a problem that a cat food dish that refills itself when it’s empty, you could say has a rudimentary AI, and you can say Westworld is populated with AIs, and those things are so vastly different, and they’re not even really on a continuum, are they? A general intelligence isn’t just a better narrow intelligence, or is it?
So I think that’s a very interesting question. Whether basically improving and slowly generalizing or expanding the capabilities of narrow intelligences, will eventually get us there, and if I had to venture a guess, I would say that’s quite likely actually. That said, I’m definitely not the right person to answer that. I do think that guesses, that aspects of things are today still in the realms of philosophy and extremely hypothetical.
But the one trick that we have gotten good at recently that’s given us things like AlphaZero, is machine learning, right? And it is itself a very narrow thing. It basically has one core assumption, which is the future is like the past. And for many things it is: what a dog looks like in the future, is what a dog looked like yesterday. But, one has to ask the question, “How much of life is actually like that?” Do you have an opinion on that?
Yeah so I think that machine learning is actually evolving rapidly from the initial classic idea of basically trying to predict the future just in the past, and not just the past as a kind of encapsulated version of the past. So it’s basically a snapshot captured in this fixed static data set. You expose machines to that, you allow it to learn from that, train on that, whatever you want to call it, and then you evaluate how the resulting model or machine or network does in the wild or on some evaluation tasks, and tests that you’ve prepared for it.
It’s evolving from that classic definition towards something that is quite a bit more dynamic, that is starting to incorporate learning in situ, learning kind of “on the job,” learning from very different kinds of supervision, where some of it might be encapsulated by data sets, but some might be given to the machine through somewhat more high level interactions, maybe even through language. There is at least a bunch of lines of research attempting that. Also quite importantly, we’re starting slowly but surely to employ machine learning in ways where the machine’s actions actually have an impact on the world, from which the machine then keeps learning. I think that that’s actually something [for which] all of these parts are necessary ingredients, if we ever want to have narrow intelligences, that maybe have a chance of getting more general. Maybe then in the more distant future, might even be bolted together into somewhat more general artificial intelligence.
Listen to this one-hour episode or read the full transcript at www.VoicesinAI.com
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.

Voices in AI – Episode 69: A Conversation with Raj Minhas


About this Episode

Episode 69 of Voices in AI features host Byron Reese and Dr. Raj Minhas talk about AI, AGI, and machine learning. They also delve into explainability and other quandaries AI is presenting. Raj Minhas has a PhD and MS in Electrical and Computer Engineering from the University of Toronto, with his BE from Delhi University. Raj is also the Vice President and Director of Interactive and Analytics Laboratory at PARC.
Visit www.VoicesinAI.com to listen to this one-hour podcast or read the full transcript.

Transcript Excerpt

Byron Reese: This is Voices in AI, brought to you by GigaOm, I’m Byron Reese. Today I’m excited that our guest is Raj Minhas, who is Vice President and the Director of Interactive and Analytics Laboratory at PARC, which we used to call Xerox PARC. Raj earned his PhD and MS in Electrical and Computer Engineering from the University of Toronto, and his BE from Delhi University. He has eight patents and six patent-pending applications. Welcome to the show, Raj!
Raj Minhas: Thank you for having me.
I like to start off, just asking a really simple question, or what seems like a very simple question: what is artificial intelligence?
Okay, I’ll try to give you two answers. One is a flip response, which is if you tell me what is intelligence, I’ll tell you what is artificial intelligence, but that’s not very useful, so I’ll try to give you my functional definition. I think of artificial intelligence as the ability to automate cognitive tasks that we humans do, so that includes the ability to process information, make decisions based on that, learn from that information, at a high level. That functional definition is useful enough for me.
Well I’ll engage on each of those, if you’ll just permit me. I think even given a definition of intelligence which everyone agreed on, which doesn’t exist, artificial is still ambiguous. Do you think of it as artificial in the sense that artificial turf really isn’t grass, so it’s not really intelligence, it just looks like intelligence? Or, is it simply artificial because we made it, but it really is intelligent?
It’s the latter. So if we can agree on what intelligence is, then artificial intelligence to me would be the classical definition of artificial intelligence, which is re-creating that outside the human body. So re-creating that by ourselves, it may not be re-created in the way it-is created in our minds, in the way humans or other animals do it, but, it’s re-created in that it achieves the same purpose, it’s able to reason in the same way, it’s able to perceive the world, it’s able to do problem solving in that way. So without getting necessarily bogged down by what is the mechanism by which we have intelligence, and does that mechanism need to be the same; artificial intelligence to me would be re-creating that – the ability of that.
Fair enough, so I’ll just ask you one more question along these lines. So, using your ability to automate cognitive tasks, let me give you four or five things, and you tell me if they’re AI. AlphaGo?
And then a step down from that, a calculator?
Sure, a primitive form of AI.
A step down from that: an abacus?
Abacus, sure, but it involves humans in the operation of it, but maybe it’s on that boundary where it’s partially automated, but yes.
What about an assembly line?
Sure, so I think…
And then I would say my last one which is a cat food dish that refills itself when it’s empty? And if you say yes to that…
All of those things to me are intelligent, but some of those are very rudimentary, and not, so, for example, you look at animals. On one end of the scale are humans, they can do a variety of tasks that other animals cannot, and on the other end of the spectrum, you may have very simple organisms, single-celled or mammals, they may do things that I would find intelligent, they may be simply responding to stimuli, and that intelligence may be very much encoded. They may not have the ability to learn, so they may not have all aspects of intelligence, but I think this is where it gets really hard to say what is intelligence. Which is my flip response.
If you say: what is intelligence? I can say I’m trying to automate that by artificial intelligence, so, if you were to include in your definition of intelligence, which I do, that ability to do math implies intelligence, then by automating that with an abacus is a way of artificially doing that, right? You have been doing it in your head using whatever mechanism is in there, you’re trying to do that artificially. So it is a very hard question that seems so simple, but, at some point, in order to be logically consistent, you have to say yes, if that’s what I mean, that’s what I mean, even though the examples can get very trivial.
Well I guess then, and this really is the last question along those lines: what, if everything falls under your definition, then what’s different now? What’s changed? I mean a word that means everything means nothing, right?
That is part of the problem, but I think what is becoming more and more different is, the kinds of things you’re able to do, right? So we are able to reason now artificially in ways that we were not able to before. Even if you take the narrower definition that people tend to use which is around machine learning, they’re able to use that to perceive the world in ways in which we were not able to before, and so, what is changing is that ability to do more and more of those things, without relying on a person necessarily at the point of doing them. We still rely on people to build those systems to teach them how to do those things, but we are able to automate a lot of that.
Obviously artificial intelligence to me is more than machine learning where you show something a lot of data and it learns just for a function, because it includes the ability to reason about things, to be able to say, “I want to create a system that does X, and how do I do it?” So can you reason about models, and come to some way of putting them together and composing them to achieve that task?
Listen to this one-hour episode or read the full transcript at www.VoicesinAI.com
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.

5 Common Misconceptions about AI

In recent years I have ran into a number of misconceptions regarding AI, and sometimes when discussing AI with people from outside the field, I feel like we are talking about two different topics. This article is an attempt at clarifying what AI practitioners mean by AI, and where it is in its current state.
The first misconception has to do with Artificial General Intelligence, or AGI:

  1. Applied AI systems are just limited versions of AGI

Despite what many think,the state of the art in AI is still far behind human intelligence. Artificial General Intelligence, i.e. AGI, has been the motivating fuel for all AI scientists from Turing to today. Somewhat analogous to Alchemy, the eternal quest for AGI that replicates and exceeds human intelligence has resulted in the creation of many techniques and scientific breakthroughs. AGI has helped us understand facets of human and natural intelligence, and as a result, we’ve built  effective algorithms inspired by our understanding and models of them.
However, when it comes to practical applications of AI, AI practitioners do not necessarily restrict themselves to pure models of human decision making, learning, and problem solving. Rather, in the interest of solving the problem and achieving acceptable performance, AI practitioners often do what it takes to build practical systems. At the heart of the algorithmic breakthroughs that resulted in Deep Learning systems, for instance, is a technique called back-propagation. This technique, however, is not how the brain builds models of the world. This brings us to the next misconception:

  1. There is a one-size-fits-all AI solution.

A common misconception is that AI can be used to solve every problem out there–i.e. the state of the art AI has reached a level such that minor configurations of ‘the AI’ allows us to tackle different problems. I’ve even heard people assume that moving from one problem to the next makes the AI system smarter, as if the same AI system is now solving both problems at the same time. The reality is much different: AI systems need to be engineered, sometimes heavily,  and require specifically trained models in order to be applied to a problem. And while similar tasks, especially those involving sensing the world (e.g., speech recognition, image or video processing) now have a library of available reference models, these models need to be specifically engineered to meet deployment requirements and may not be useful out of the box. Furthermore, AI systems are seldom the only component of AI-based solutions. It often takes many tailor-made classically programed components to come together to augment one or more AI techniques used within a system. And yes, there are a multitude of different AI techniques out there, used alone or in hybrid solutions in conjunction with others, therefore it is incorrect to say:

  1. AI is the same as Deep Learning

Back in the day, we thought the term artificial neural networks (ANNs) was really cool. Until, that is, the initial euphoria around it’s potential backfired due to its lack of scaling and aptitude towards over-fitting. Now that those problems have, for the most part, been resolved, we’ve avoided the stigma of the old name by “rebranding” artificial neural networks as  “Deep Learning”. Deep Learning or Deep Networks are ANNs at scale, and the ‘deep’ refers not to deep thinking, but to the number of hidden layers we can now afford within our ANNs (previously it was a handful at most, and now they can be in the hundreds). Deep Learning is used to generate models off of labeled data sets. The ‘learning’ in Deep Learning methods refers to the generation of the models, not to the models being able to learn real-time as new data becomes available. The ‘learning’ phase of Deep Learning models actually happens offline, needs many iterations, is time and process intensive, and is difficult to parallelize.
Recently, Deep Learning models are being used in online learning applications. The online learning in such systems is achieved using different AI techniques such as Reinforcement Learning, or online Neuro-evolution. A limitation of such systems is the fact that the contribution from the Deep Learning model can only be achieved if the domain of use can be mostly experienced during the off-line learning period. Once the model is generated, it remains static and not entirely robust to changes in the application domain. A good example of this is in ecommerce applications–seasonal changes or short sales periods on ecommerce websites would require a deep learning model to be taken offline and retrained on sale items or new stock. However, now with platforms like Sentient Ascend that use evolutionary algorithms to power website optimization, large amounts of historical data is no longer needed to be effective, rather, it uses neuro-evolution to shift and adjust the website in real time based on the site’s current environment.   
For the most part, though, Deep Learning systems are fueled by large data sets, and so the prospect of new and useful models being generated from large and unique datasets has fueled the misconception that…

  1. It’s all about BIG data

It’s not. It’s actually about good data. Large, imbalanced datasets can be deceptive, especially if they only partially capture the data most relevant to the domain. Furthermore, in many domains, historical data can become irrelevant quickly. In high-frequency trading in the New York Stock Exchange, for instance, recent data is of much more relevance and value than, for example data from before 2001, when they had not yet adopted decimalization.
Finally, a general misconception I run into quite often:

  1. If a system solves a problem that we think requires intelligence, that means it is using AI

This one is a bit philosophical in nature, and it does depend on your definition of intelligence. Indeed, Turing’s definition would not refute this. However, as far as mainstream AI is concerned, a fully engineered system, say to enable self-driving cars, which does not use any AI techniques, is not considered an AI system. If the behavior of the system is not the result of the emergent behavior of AI techniques used under the hood, if programmers write the code from start to finish, in a deterministic and engineered fashion, then the system is not considered an AI-based system, even if it seems so.
AI paves the way for a better future
Despite the common misconceptions around AI, the one correct assumption is that AI is here to stay and is indeed, the window to the future. AI still has a long way to go before it can be used to solve every problem out there and to be industrialized for wide scale use. Deep Learning models, for instance, take many expert PhD-hours to design effectively, often requiring elaborately engineered parameter settings and architectural choices depending on the use case. Currently, AI scientists are hard at work on simplifying this task and are even using other AI techniques such as reinforcement learning and population-based or evolutionary architecture search to reduce this effort. The next big step for AI is to make it be creative and adaptive, while at the same time, powerful enough to exceed human capacity to build models.  
by Babak Hodjat, co-founder & CEO Sentient Technologies

Voices in AI – Episode 52: A Conversation with Rao Kambhampati


About this Episode

Sponsored by Dell and Intel, Episode 52 of Voices in AI, features host Byron Reese and Rao Kambhampati discussing creativity, military AI, jobs and more. Subbarao Kambhampati is a professor at ASU with teaching and research interests in Artificial Intelligence. Serving as the president of AAAI, the Association for the Advancement of Artificial Intelligence.
Visit www.VoicesinAI.com to listen to this one-hour podcast or read the full transcript.

Transcript Excerpt

Byron Reese: This is Voices in AI, brought to you by GigaOm. I’m Byron Reese. Today my guest is Rao Kambhampati. He has spent the last quarter-century at Arizona State University, where he researches AI. In fact, he’s been involved in artificial intelligence research for thirty years. He’s also the President of the AAAI, the Association for the Advancement of Artificial Intelligence. He holds a Ph.D.in computer science from the University of Maryland, College Park. Welcome to the show, Rao.
Rao Kambhampati: Thank you, thank you for having me.
I always like to start with the same basic question, which is, what is artificial intelligence? And so far, no two people have given me the same answer. So you’ve been in this for a long time, so what is artificial intelligence?
Well, I guess the textbook definition is, artificial intelligence is the quest to make machines show behavior, that when shown by humans would be considered a sign of intelligence. So intelligent behavior, of course, that right away begs the question, what is intelligence? And you know, one of the reasons we don’t agree on the definitions of AI is partly because we all have very different notions of what intelligence is. This much is for sure; intelligence is quite multi-faceted. You know we have the perceptual intelligence—the ability to see the world, you know the ability to manipulate the world physically—and then we have social, emotional intelligence, and of course you have cognitive intelligence. And pretty much any of these aspects of intelligent behavior, when a computer can show those, we would consider that it is showing artificial intelligence. So that’s basically the practical definition I use.
But to say, “while there are different kinds of intelligences, therefore, you can’t define it,” is akin to saying there are different kinds of cars, therefore, we can’t define what a car is. I mean that’s very unsatisfying. I mean, isn’t there, this word ‘intelligent’ has to mean something?
I guess there are very formal definitions. For example, you can essentially consider an artificial agent, working in some sort of environment, and the real question is, how does it improve its long-term reward that it gets from the environment, while it’s behaving in that environment? And whatever it does to increase its long-term reward is seen, essentially as—I mean the more reward it’s able to get in the environment, the more important it is. I think that is the sort of definition that we use in introductory AI sorts of courses, and we talk about these notions of rational agency, and how rational agents try to optimize their long-term reward. But that sort of gets into more technical definitions. So when I talk to people, especially outside of computer science, I appeal to their intuitions of what intelligence is, and to the extent we have disagreements there, that sort of seeps into the definitions of AI.
Listen to this one-hour episode or read the full transcript at www.VoicesinAI.com 
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.

Voices in AI – Episode 47: A Conversation with Ira Cohen

In this episode, Byron and Ira discuss transfer learning and AI ethics.
[podcast_player name=”Episode 47: A Conversation with Ira Cohen” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-06-05-(01-02-19)-ira-cohen.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/06/voices-headshot-card.jpg”]
Byron Reese: This is Voices in AI, brought to you by GigaOm, and I’m Byron Reese. Today our guest is Ira Cohen, he is the cofounder and chief data scientist at Anodot, which has created an AI-based anomaly detection system. Before that he was chief data scientist over at HP. He has a BS in electrical engineering and computer engineering, as well as an MS and a PhD in the same disciplines from The University of Illinois. Welcome to the show, Ira.
Ira Cohen: Thank you very much for having me.
So I’d love to start with the simple question, what is artificial intelligence?
Well there is the definition of artificial intelligence of machines being able to perform cognitive tasks, that we as humans can do very easily. What I like to think about in artificial intelligence, is machines taking on tasks for us that do require intelligence, but leave us time to do more thinking and more imagination, in the real world. So autonomous cars, I would love to have one, that requires artificial intelligence, and I hate driving, I hate the fact that I have to drive for 30 minutes to an hour every day, and waste a lot of time, my cognitive time, thinking about the road. So when I think about AI, I think how it improves my life to give me more time to think about even higher level things.
Well, let me ask the question a different way, what is intelligence?
That’s a very philosophical question, yes, so it has a lot of layers in it. So, when I think about intelligence for humans, it’s the ability to imagine something new, so imagine, have a problem and imagine a solution and think about how it will look like without actually having to build it yet, and then going in and implementing it. That’s what I think about [as] intelligence..
But a computer can’t do that, right?
That’s right, so when I think about artificial intelligence, personally at least, I don’t think that, at least in our lifetime, computers will be able to solve those kind of problems, but, there is a lower level of intelligence of understanding the context of where you are, and being able to take actions on it, and that’s where I think that machines can do a good task. So understanding a context of the environment and taking immediate actions based on that, that are not new, but are already… people know how to do them, and therefore we can code them into machines to do them.
I’m only going to ask you one more question along these lines and then we’ll move on, but you keep using the word “understand.” Can a computer understand anything?
So, yeah, the word understanding is another hard word to say. I think it can understand, well, at least it can recognize concepts. Understanding maybe requires a higher level of thinking, but understanding context and being able to take an action on it, is what I think understanding is. So if I see a kid going into the road while I’m driving, I understand that this is a kid, I understand that I need to hit the brake, and I think machines can do these types of understanding tasks.
Fair enough, so, if someone said what is the state of the art like, they said, where are we at with this, because it’s in the news all the time and people read about it all the time, so where are we at?
So, I think we’re at the point where machines can now recognize a lot of images and audio or various types of data, recognize with sensors, recognize that there are objects, recognize that there are words being spoken, and identify them. That’s really where we’re at today, we’re not… we’re getting to the point where they’re starting to also act on these recognition tasks, but most of the research, most of what AI is today, is the recognition tasks. That’s the first step.
And so let’s just talk about one of those. Give me something, some kind of recognition that you’ve worked on and have deep knowledge of, teaching a computer how to do…
All right, so, when I did my PhD, I worked on affective computing, so, part of the PHD was to have machines recognize emotions from facial expressions. So, it’s not really recognizing emotion, it’s recognizing a facial expression and what it may express. So there are 6 universal facial expressions that we as humans exhibit, so, smiling is associated with happiness, there is surprise, anger, disgust, and those are actually universal. So, the task that I worked on was to build classifiers, that given an image or a sequence of a video of a person, a person’s face, would recognize whether they’re happy or sad or disgusted or surprised or afraid…
So how do you do that? Like do you start with biology and you say “well how do people do it?” Or do you start by saying “it doesn’t really matter how people are doing it, I’m just going to brute force, show enough labeled data, that it can figure it out, that it just learns without ever having a deep understanding of it?”
All right so this was in the early 2000s, and we didn’t have deep learning yet, so we had neural networks, but we weren’t able to train them with huge amounts of data. There wasn’t a huge amount of data, so the brute force approach was not the way to go. What I actually worked on is based on research by a psychologist, that actually mapped facial movements to known expressions, and therefore to known emotions. So it started out in the 70s, by people in the psychology field, [such as] Charles Akemann, in San Francisco, who mapped out actual… he created a map of facial movements into facial expressions, and so that was the basis of what are the type of features I need to extract from video and then feed that to a classifier, and then you go through the regular process of machine learning of collecting a lot of data, but the data is transformed, so these videos were transformed into known features of facial movements, and then, you can feed that into a classifier that learns in a supervised way. So I think a lot of the tasks around intelligence are that way. It’s being changed a little bit by deep learning, which supposedly takes away the need to know the features are a priori, and do the feature engineering for the machinery task…
Why do you say “supposedly”?
Because it’s not completely true. You still have to do, even in speech, even in images, you still have to do some transformations of the raw data, it’s not just take it as is, and it will work magically and do everything for you. There is some… you do have to, for example in speech, you do have to do various transformations of the speech into all sorts of short term Fourier transform or other types of transformations, without which, the methods afterwards will not produce results.
So, if I look at a photo of a cat, that somebody’s posted online or a dog, that’s in surprise, you know, it’s kind of comical, the look of surprise, say, but a human can recognize that in something as simple as a stick figure… What are we doing there do you think? Is that a kind of transferred learning, or how is it that you can show me an alien and I would say, “Ah, he’s happy…”What do you think we’re doing there…?
Yeah, we’re doing transferred learning. Those are really examples of us taking one concept that we were trained on from the day we were born, with our visual cortex and also then in the brain, because our brain is designed to identify emotions, just out of the need to survive, and then when we see something else, we try to map it onto a concept that we already know, and then if something happens that is different from what we expected, then we start training to that new concept. So if we see an alien smiling, and all of a sudden when he smiles, he shoots at you, you would quickly understand that smiling for an alien, is not associated with happiness, but you will start offby thinking, “this could be happy”.
Yeah, I think that I remember reading that, hours after birth, children who haven’t even been trained on it, can recognize the difference between a happy and sad face. I think they got sticks and put drawings on them and try to see the baby’s reactions. It may even be even something deeper than something we learn, something that’s encoded in our DNA.
Yeah, and that may be true because we need to survive.
So why do you think we’re so good at it and machines aren’t, right, like, machines are terrible right now at transfer learning. We don’t really know how it works do we, because we can’t really code that abstraction that a human gets, so..
I think that from what I see first, it’s being changed. I see work coming out of Google AI labs that is starting to show how they are able to train single models, very large models, that are able to do some transfer learning on some tasks, and, so it is starting to change. So machines have a very different… they don’t have to survive –  they don’t have this notion of danger, and surviving, and I think until we are able to somehow encode that in them, we would always have to, ourselves, code the new concepts or understand how to code for them, how to learn new concepts using transfer learning…
You know the roboticist Rodney Brooks, talks about “the juice”, he talks about how, if you put an animal in a box, it feels trapped, it just tries and tries to get out and it clearly has a deep desire to get out, but you but in a robot to do it, the robot doesn’t have what he calls “the juice,” and he of course doesn’t think it’s anything spiritual or metaphysical or anything like that. But what do you think that is? What do you think is the juice? Because that’s what you just alluded to, machines don’t have to survive, so what do you think that is?
So I think he’s right, they don’t have the juice. Actually in my lab, during my PhD, we had some students working on teaching robots to move around, and actually, the way they did it was rewards and punishments. So they would get… they actually coded—just like you have in reinforcement learning—if you hit a wall, you get a negative reward. If the robot moved and did something he wasn’t supposed to, the PhD student would yell at them, and that would be encoded into a negative reward, and if he did something right, they had actions that gave them positive rewards. Now it was all kind of fun and games, but potentially if you do this for long enough, with enough feedback, the robot would learn what to do and what not to do, the main thing that’s different is that it still lives in the small world of where they were, in the lab or in the hallways of our labs. It didn’t have the intelligence to then take it and transfer it to somewhere else…
But the computer can never… I mean the inherent limit in that is that the computer can never be afraid, be ashamed, be motivated, be happy…
Yes. It doesn’t have the long term reward or the urge to survive, I guess.
You may be familiar with this, but I’d like to set it up anyway. There was a robot in Japan, it was released in a mall, and it was basically being taught how to get around and if it ran into a person, if it came up to a person, it would politely ask the person to move, and if the person didn’t, it would just zoom around them. And what happened was children would just kind of mess with it, maybe jump in front of it when it tried to go around them again and again and again, but the more kids there were, the more likely they were to get brutal. They would hit it with things, they would yell at it and all of that, and the programmers ended up having to program it, that if it had a bunch of short people around it, like children, it needed to find a tall person, an adult, and zip towards it, but the distressing thing about it is when they later asked those children who had done that, they said, “Did you cause the robot distress?” 75% of them said yes, and then they asked if it behaved human-like or machine-like, and only 15% said machine-like, and so they thought that they were actually causing distress and it was behavinglike a humanoid.What do you think that says? Does that concern you in any way?
Personally, it doesn’t, because I know that, as long as machines don’t have real affect in them, then, we might be transferring what we think stress is onto a machine that doesn’t really feel that stress… it’s really about codes…
I guess the concern is that if you get in the habit of treating something that you regard as being in distress, if you get into the habit of treating it callously, this is what Weizenbaum said, he thought that it would have a dampening effect on human empathy, which would not be good… Let me ask you this, what do you think about embodying artificial intelligence? Because you think about the different devices: Amazon has theirs, it’s right next to me, so I can’t say its name, but it’s a person’s name… Apple has Siri, Microsoft has Cortana… But Google just has the google system, it doesn’t have a name. Do you think there’s anything about that… why do you think it is? Why would we want to name it or not name it, why would we decide not to name it? Do you think we’re going to want to interact with these devices as if they’re other people? Or are we always going to want them to be obviously mechanistic?
My personal feeling is that we want them to be mechanistic, they’re there not to exist on their own accord, and reproduce and create a new world. They’re there to help us, that’s the way I think AI should be, to help us in our tasks. Therefore when you start humanizing it, then you’re going to either have the danger of mistreating it, treating it like basically slaves, or you’re going to give it other attributes that are not what they are, thinking that they are human, and then going the other route, and they’re there to help us, just like robots, or just like the industrial revolution brought machines that help humans manufacture things better… So they’re there to help us, I mean we’re creating them, not as beings, but rather as machines that help us improve humanity, and if we start humanizing them and then, either mistreating them, like you mentioned with the Japanese example, then it’s going to get muddled and strange things can happen…
But isn’t that really what is going to happen? Your PhD alone, which is how do you spot emotions? Presumably would be used in a robot, so it could spot your emotions, and then presumably it would be programmed to empathize with you, like “don’t be worried, it’s okay, don’t be worried,” and then to the degree it has empathy with you, you have emotional attachment to it, don’t you go down that path?
It might, but I think we can stop it. So the reason to identify the emotion is because it’s going to help me do something, so, for example, our research project was around creating assistance for kids to learn, so in order to help the kid learn better, we need to empathize with the state of mind of the child, so it can help them learn better. So that was the goal of the task, and I think as long as we encapsulate it in well-defined goals that help humans, then, we won’t have the danger of creating… the other way around.  Now, of course maybe in 20 years, what I’m saying now will be completely wrong and we will have a new world where we do have a world of robots that we have to think about how do we protect them from us. But I think we’re not there yet, I think it’s a bit science fiction, this one.
So I’m still referring back to your earlier “supposedly” comment about neural nets, what do you think are other misconceptions that you run across about artificial intelligence? What do you think are, like your own pet peeves, like “that’s not true, or that’s not how it works?” Does anything come to mind?
People think, because of the hype, that it does a lot more than it really does. We know that it’s really good at classification tasks, it’s not yet very good at anything that’s not classification, unsupervised tasks, it’s not being able to learn new concepts all by itself, you really have to code it, and it’s really hard. You need a lot of good people that know the art of applying neural nets to different problems. It doesn’t happen just magically, the way people think.
I mean you’re of course aware of high profile people: Elon Musk, Stephen Hawking, Bill Gates, and so forth who [have been] worried about what a general intelligence would do, they use terms like “existential threat” and all that, and they also, not to put words in their mouth, believe that it will happen sooner rather than later… Because you get Andrew Ng, who says, “worry about overpopulation of Mars,” maybe in a couple hundred years you have to give it some thought, but you don’t really right now…So where do you think their concern comes from?
So, I’m not really sure and I don’t want to put any words in their mouth either, but, I mean the way I see it, we’re still far off from it being an existential threat. The main concern is you might have people who will try to abuse AI, to actually fool other people, that I think is the biggest danger, I mean, I don’t know if you saw the South Park episode last week, they had their first episode where Cartman actually bought an Alexa and started talking to his Alexa, and I hope your Alexa doesn’t start working now…. So it basically activated a lot of Alexas around the country, so he was adding stuff to the shopping cart, really disgusting stuff, he was setting alarm clocks, he was doing all sorts of things, and I think the danger of the AI today is really getting abused by other people, for bad purposes, in this case it was just funny… But you can have cases where people will control autonomous cars, other people’s autonomous cars by putting pictures by the side of the road and causing them to swerve or stop, or do things they’re not supposed to, or building AI that will attack other types of AI machines. So I think the danger comes from the misuse of the technology, just like any other technology that came out into the world… And we have to… I think that’s where the worry comes from, and making sure that we put some sort of ethical code of how to do that…
What would that look like? I mean that’s a vexing problem…
Yes, I don’t know, I don’t have the answer to that…
So there are a number of countries, maybe as many as twenty, that are working on weaponizing, building AI-based weapons systems, that can make autonomous kill decisions. Does that worry you? Because that sounds like where you’re going with this… if they put a plastic deer on the side of the road and make the car swerve, that’s one thing, but if you literally make a killer robot that goes around killing people, that’s a whole different thing. Does that concern you, or would you call that a legitimate use of the technology…?
I mean this kind of use will happen, I think it will happen no matter what, it’s already happening with drones that are not completely autonomous, but they will be autonomous probably in the future. I think that I don’t know how it can be… this kind of progress can be stopped, the question is, I mean, the danger I think is, do these robots start having their own decision-making and intelligence that decides, just like in the movies, to attack all humankind, and not just the side they’re fighting on… Because technology in [the] military is something that… I don’t know how it can be stopped, because it’s driven by humans… Our need to wage war against each other… The real danger is, do they turn on us? And if there is real intelligence in the artificial intelligence, and real understanding and need to survive as a being, that’s where it becomes really scary…
So it sounds like you don’t necessarily think we’re anywhere near close to an AGI, and I’m going to ask you how far away you think we are… I want to set the question up as saying that, there are people who think we’re 5-10 years away from a general intelligence and then there are people who think we’re 500 years [away].Oren Etzioni was on the show, and he said he would give anyone 1000:1 odds that we wouldn’t have it in 5 years, so if you want to send him $10 he’ll put $10,000 against that. So why do you think there’s such a gap, and where are you in that continuum?
Well, because the methods we’re using are still so… as smart as they got, they’re still doing rudimentary tasks. They’re still recognizing images—the agents that are doing automated things for us, they’re still doing very rudimentary tasks. General intelligence requires a lot more than that, that requires a lot more understanding of context. I mean the example of Alexa last week, that’s a perfect example of not understanding context, for us as humans, we would never react to something on TV like that and add something to our shopping cart, just because Cartman said it, where even the very, very smart Alexa with amazing speech understanding, and taking actions based on that, it still doesn’t understand the context of the world, so I think prophecy is for fools, but I think it’s at least 20 years out…
You know, we often look at artificial intelligence and its progress based on games where it beats the best player, that goes back to [Garry] Kasparov in 97, you have of course Jeopardy, you have Alpha Go, you had… an AI beat some world rated poker players, what do you think…And those are all kind of… they create a stir, you want to reflect on it, what do you think is the next thing like that, that one day, snap your fingers and all of a sudden an AI just did… what?
Okay, I haven’t thought about that… All these games, what makes them unique is that they are a very closed world; the world of the game, is finite and the rules are very clear, even if there’s a lot of probability going on, the rules are very clear, and if you think in the real world—and this may be going back to the questions why it will take time—for artificial intelligence to really be general intelligence, the real world is almost infinite in possibilities and the way things can go, and even for us, it’s really hard.
Now trying to think of a game that machines would beat us next in. I wonder if we were able to build robots that can do lots of sports, I think they could beat us easily in a lot of games, because if you take any sports game like football or basketball, they require intelligence, they require a lot of thinking, very fast thinking and path finding by the players, and if we were able to build the body of the robot that can do the motions just like humans, I think they can easily beat us at all these games.
Do you, as a practitioner… I’m intrigued by it, on the topic of general intelligence, intrigued by the idea that, human DNA isn’t really that much code, and if you look at how much code that we are different than say a chimp, it’s very small, I mean it’s a few megabytes. That would be, how we are programmatically different, and yet, that little bit of code, makes us have a general intelligence and a chimp not. Does that persuade you or suggest to you that general intelligence is a simple thing, that we just haven’t discovered, or do you think that general intelligence is a hack of a hundred thousand different… like it’s going to be a long slog and then we finally get it together…?
So, I think [it’s] the latter, just because the way you see human progress, and it’s not just about one person’s intelligence. I think what makes us unique is the ability to combine intelligence of a lot of different people to solve tasks, and that’s another thing that makes us very different. So you do have some people that are geniuses that can solve really really hard tasks by themselves, but if you look at human progress, it’s always been around combined intelligence of getting one person’s contribution, then another person’s contribution, and thinking about how it comes together to solve that, and sometimes you have breakthroughs that come from an individual, but more often than not, it’s the combined intelligence that creates the drive forward, and that’s the part that I think is hard to put into a computer…
You know there are people that have, amazing savant-like abilities. I remember reading about a man named [George] Dantzig, and he was a graduate student in statistics, and his professor put two famous unsolvable/unsolved problems on the blackboard, and Dantzig arrived late that day. He saw them and just assumed that they were the homework, so he copied them down and went home, and later he said he thought they were a little harder than normal, but he solved them both and turned them in… and that like really happened. It’s not one of those urban legend kind of things, you have people who can read the left and right page of a book at the same exact time, you have… you just have people that are these extraordinarily edge cases of human ability,does that suggest that our intellects are actually far more robust than they are? Does that suggest anything to you as an artificial intelligence guy?
Right, so coming from the probability space, it just means that our intelligence has wide distribution, and there are always exceptions in the tails, right? And these kind of people are in the tails, and often when they are discovered, they can create monumental breakthroughs in our understanding of the world, and that’s what makes us so unique. You have a lot of people in the center of the distribution, that are still contributing a lot, and making advances to the world and to our understanding of it, and not just understanding, but actually creating new things. So I’m not a genius, most people are not geniuses, but we still create new things, and are able to advance things, and then, every once in a while you get these tails of a distribution intelligence, that could solve the really hard problems that nobody else can solve, and that’s a… so the combination of all that actually makes us push things forward in the world, and I think that kind of combined intelligence, I think that artificial intelligence is way, way off. It’s not anywhere near, because we don’t understand how it works, I think it would be hard for us to even code that into machines. That’s one of the reasons I think AI, the way people are afraid of it, it’s still way off…
But by that analysis, that sounds like, to circle that back, there will be somebody that comes along that has some big breakthrough in a general intelligence, and ta-da, it turns out all along it was, you know, bubble sort or….
I don’t think it’s that simple, that’s the thing, and solving a statistical problem that’s really, really tough, it’s not like… I don’t think it’s a well-defined enough problem, that some will take a genius just to understand.. “Oh, it’s that neuron going right to left,” and that’s it… so I don’t think it’s that simple… there might be breakthroughs in mathematics, that help you understand the computation better, maybe quantum computers that will help you do faster computation, so you can train much, much faster than machines so they can do the task much better, but, it’s not about understanding the concept of what makes a genius. I think that’s more complicated, but maybe it’s my limited way of thinking, maybe I’m not intelligent enough with it…
So to stay on that point for a minute… it’s interesting and I think perhaps, telling, that we don’t really understand how human intelligence works, like if you knew that.. like we don’t know how a thought is encoded in the brain… like if I said…Ira, what color was your first bicycle, can you answer that question?
I don’t remember… probably blue…
Let’s assume for a minute that you did remember. It makes my example bad, but there’s no bicycle location in your brain that stored the first “bicycle”… like an icon, or database lookup…like nobody knows how that happens… not only how it’s encoded, but how it’s retrieved… And then, you were talking earlier about synthesis and how we use it all together, we don’t know any of that… Does that suggest to you that, on the other end, maybe we can’t make a general intelligence… or at the very least, we cannot make a general intelligence until we understand how it is that people are intelligent…?
That may be, but yeah. First of all even if we made it, if we don’t understand it, then how would we know that we made it? Circling back to that… I think the way we… it’s just like the kids, they were thinking that they were causing stress to the robot, because they were giving it… they thought they understood stress and the affect of it, and they were transferring it onto the robot. So maybe when we create something very intelligent that looks to be like us, we would think we created intelligence, but we wouldn’t know that for sure until we know what is… general intelligence really is…
So do you believe that general intelligence is an evolutionary invention that will come along if, in 20 years, 50 years, 1,000 years… whatever it is, that it is something that will come along out of the techniques we use today from the early AI, like, are we building really, really, really primitive general intelligences, or do you have a feeling that a real AGI is going to be a whole different kind of approach in technology?
I think it’s going to be a whole different approach. I think what we’re building today are just machines that do tasks that we humans do, in a much, much better way, and just like we built machines in the industrial revolution that did what people did with their hands, but did it in a much faster way, and better way… that’s the way I see what we’re doing today… And maybe I’m wrong, maybe I’m totally wrong, and we’re giving them a lot more general intelligence than we’re thinking, but the way I see it, it’s driven by economic powers, it’s driven by the need of companies to advance, and take away tasks that cost too much money to do by humans, or are too slow to do by humans… And, revolutionizing that way, and I’m not sure that we’re really giving them general intelligence yet, still we’re giving them ways to solve specific tasks that we want them to solve, and not something very very general that can just live by itself, and create new things by itself.
Let’s take up this thread, that you just touched on, about, we build them to do jobs we don’t want to do, and you analogize it to the Industrial Revolution… so as you know, just to set the problem up, there are 3 different narratives about the effect this technology, combined with robotics, or we’ll call it automation, in general, are going to have on jobs. And the three scenarios are: one is that, it’s going to destroy an enormous number of quote, low-skill jobs, and that, they will, by definition, be fewer low skilled jobs, and more and more people competing for them and you will have this permanent class of unemployable… it’s like the Great Depression in the US, just forever. And then you have people who say, no, it’s different than that, what it really is, is, they’re going to be able to do everything we can do, they’re going to have escape… Once a machine can learn a new task faster than a person, they’ll take every job, even the creative ones, they’ll take everything. And the third one says no, for 250 years we’ve had 5-10% of unemployment, its never really gotten out of that range other than the anomalous depression, and in that time we had electricity, we had mechanization, we had steam power, we had the assembly line… we had all these things come along that sure looked like job eaters, but what people did is they used the new technology to increase their own productivity and drive their own wages higher, and that’s the story of progress, that we have experienced…So which of those three theories, or maybe a fourth one, do you think is the correct narrative?
I think the third theory is probably the more correct narrative. It just gives us more time to use our imagination and be more productive at doing more things, improve things, so, all of a sudden we’ll have time to think about going and conquering the stars, and living in the stars, or improving our lives here in various ways… The only thing that scares me is the speed of it, if it happens too quickly, too fast.. So, we’re humans, it takes, as a human race, some time to adapt. If the change happens so fast and people lose their jobs too quickly, before they’re able to retrain for the new economy, the new way of [work], the fact that some positions will not be available anymore, that’s the real danger and I think if it happens too fast around the world, then, there could be a backlash.
I think what will happen is that the progress will stop because some backlash will happen in the form of wars, or all sorts of uprisings, because, at the end, people need to live, people need to eat, and if they don’t have that, they don’t have anything to live for, they’re going to rise up, they’re not just going to disappear and die by themselves. So, that’s the real danger, if the change happens too rapidly, you can have a depression that will actually cause the progress to slow down, and I hope we don’t reach that because I would not want us, as a world, to reach that stage where we have to slow down, with all the weapons we have today, this could actually be catastrophic too…
What do you mean by that last sentence?
So I mean we have nuclear weapons…
Oh, I see, I see, I see.
We have actual weapons that can, not just… could actually annihilate us completely…
You know, I hear you  Like…what would “too fast” be? First of all, we had that when the Industrial Revolution came along… you had the Luddite movement, when Ludd broke two spinning wheels you had the thresher riots [or Swing riots] in England in the 1820s, when the automated threat, you had the… the first day the London Times was printed using steam power instead of people. They were going to go find the guy who invented that, and string him up, you had a deep-rooted fear of labor-changing technology, that’s a whole current that constantly runs, but what would too fast look like? The electrification of industry just happened lightning fast, we went from generating 5% of our power from steam to 85% in just22 years…Give me a “too fast” scenario. Are you thinking about the truck drivers, or… tell me how it could “be too fast,” because you seem to be very cautious, like, “man, these technologies are hard and they take a long time and there’s a lot of work and a lot of slog,” and then, so what would too fast look like to you?
If it’s less than a generation, let’s say in 5 years, really, all taxi drivers and truck drivers lose their job because everything becomes automated, that seems to be too fast. If it happens in 20 years, that’s probably enough time to adjust, and I think… the transition is starting, it will start in the next 5 years, but it will still take some time for it to really take hold, because if people lose those jobs today, and you have thousands or hundreds of thousands, or even millions of people doing that, what are they going to do?
Well, presumably, I mean, classic economics says that, if that happened, the cost of taking a cab goes way down, right? And if that happens, that frees up money that I no longer have to spend on an expensive cab, and therefore I spend that money elsewhere,  which generates demand for more jobs, but, is the 5-year scenario… it may be a technical possibility, like we may “technically” do it, if we don’t have a legislative hurdle.
I read this article in India, which said they’re not going to allow self-driving cars in India because that would put people out of work, then you have the retrofit problem, then every city’s going to want to regulate it and say well, you can have a self-driving car, but it needs to have a person behind the wheel just in case. I mean like you would say, look, we’ve been able to fly airplanes without a pilot for decades, yet no airline in the world would touch that, in this plane, we have no pilot… even though that’s probably a better way to do it…So, do you really think we can have all the taxi drivers gone in 5 years?
No, and exactly for that reason, even if our technology really allows it. First of all, I don’t think it will totally allow it, because for it to really take hold you have to have a majority of cars on the road to be autonomous. Just yesterday I was in San Francisco, and I heard a guy say he was driving behind one of those self-driving cars in San Francisco, and he got stuck behind it, because it wouldn’t take a left turn when it was green, and it just forever wouldn’t take a left turn that humans would… The reason why it wouldn’t take a left turn was there were other cars that are human-driven on the road, and it was coded to be very, very careful about it, and he was 15 minutes late to our meeting just because of that self-driving car…
Now, so I think there will be a long transition partly because legislation will regulate it, and slow it down a bit, which is a good thing. You don’t want to change too fast, too quickly without making sure that it really works well in the world, and as long as there is a mixture of humans driving and machines driving, the machines will be a little bit “lame,” because they will be coded to be a lot more careful than us, and we’re impatient, so, that will slow things down which is a good thing, I think making a change too fast can lead to all sorts of economic problems as well…
You know in Europe they had… I could be wrong on this, I think it was first passed in France, but I think it was being considered by the entire EU, and it’s the right to know why the AI decided what it did. If an AI made the decision to deny you a loan, or what have you, you have the right to know why it did that… I had a simple question which was, is that possible? Could Google ever say, I’m number four for this search and my competitor’s number three, why am I number four and they’re number three? Is Google big and complicated enough, and you don’t have to talk specifically about Google, but, are systems big and complicated enough that we don’t know… there are so many thousands of factors that go into this thing, that many people never even look at, it’s just a whole lot of training…
Right, so in principle, the methods could tell you why they made that decision. I mean, even if there are thousands of factors, you can go through all of them and have not just the output of their recognition, but also highlight what were the attributes that caused it to decide it’s one thing or another. So from the technology point of view, it’s possible, from the practical point of view, I think for a lot of problems, you don’t, you won’t really care. I mean, if it recognized that there’s a cat in the image, and you know it’s right, you won’t care why it’s recognized that cat. I guess for some problems where the system made a decision that you don’t necessarily know why it made the decision, or you have to take action based on that recognition, you would want to know. So if I predicted for you that your revenue is going to increase by 20% in the next week, you would probably want that system to tell you, why do you think that’s happened, because there isn’t a clear reason for it that you would imagine yourself, but, if the system told you there is a face in this image, and you just look at the image, and you can see that there’s a face in that image, then you won’t have a problem with it, so I think it really depends on the problem that you’re trying to solve…
We talked about games earlier and you pointed out that they were closed environments and that’s really a place with explicit rules, a place that an AI can excel, and I’ll add to that, there’s a clear cut idea of what winning looks like, and what a point is. I think somebody on the show said, “Who’s winning this conversation right now?” There’s no way to do that, so my question to you is,if you walk around an enterprise and you say “where can I apply artificial intelligence to my business?” would you look for things that looked like games? Like, okay, HR you have all these successful employees that get high performance ratings, and then you have all these people you had to fire because they didn’t, and then you get all these resumes in. Which ones more look like the good people as opposed to the bad people? Are there lots of things like that in life that look like games… or is the whole game thing really a distraction from solving real world problems, nothing really is a game in the real world…
Yeah, I think it’d be wrong to look at it as a game, because the rules… first there is no real clear notion of winning. What you want is progress, you have goals that you want to progress towards, you want, for example, in business, you want your company to grow. That could be your goal, or you want the profits to grow, you want your revenue to grow, so you make these goals, because that’s how you want things to progress and then you can look at all the factors that help it grow. The world of how to “make it grow” is very large, there are so many factors, so if I look at my employees, there might be a low-performing employee in one aspect of my business, but maybe that employee brings to the team, you know, a lot of humor that causes them to be productive, and I can’t measure that. Those kind of things are really, really hard to measure and, so looking at it from a very analytic point of view of just a “game,” would probably miss a lot of important factors.
So tell me about the company you co-founded, Anodot, because you make an anomaly detection system using AIs. So first of all, explain what that is and what that looks like, but how did you approach that problem? If it’s not a game, instead of… you looked at it this way…
So, what are anomalies? Anomalies are anything that’s unexpected, so our approach was: you’re a business and you’re collecting lots and lots and lots of data related to your business. At the end, you want to know what’s going on with the business, that’s the reason you collect a lot of data. Now, when today, people have a lot of different tools that help them kind of slice and dice the data, ask questions about what’s happening there, so you can make informed decisions about the future or react to things that are happening right now, that could affect your business.
The problem with that, is that basically… why isn’t it AI? It’s not AI because you’re basically asking a question and letting the computers compute something for you and giving you and answer; whereas anomalies, by nature, are things that happen that are unexpected, so you don’t necessarily know to ask the question in advance, and unexpected things could happen.  In businesses for example, you see a certain revenue for a product you’re selling going down in a certain city, why’s that happening? If you don’t look at it, and if you don’t ask the question in advance, you’re not even aware that that is happening… so, the great thing about AI, and machine learning algorithms, is they can process a lot of data, and if you can encode into a machine, an algorithm that identifies what are anomalies, you can find them in very, very large scale, and that helps the companies actually detect that things are going wrong, or detect the opportunities that they have, that they might miss otherwise. Where the endgame is very simple, to help you improve your business constantly and maintain it and avoid the risks of doing business, so, it’s not a “game,” it’s actually bringing immediate value to a company, highlighting, putting light on the data that they really need to look at with respect to their business, and the great thing about machine-learning algorithms, [is] they can process all of this data much better than we could, because what do humans do? We graph them, we visualize the data in various ways, you know, we create queries from database about questions that we think might be relevant, but we can’t really process all the data, all the time in an economical way. You would have to hire armies of people to do that, and machines are very good at that, so, that’s why we built Anodot…
Give me an example, like tell me a use case or a real world example of something that Anodot, well that you were able to spot that a person might not have been able to…?
So, we have various customers that are in the e-commerce business, and if you’re in e-commerce and you’re selling a lot of different products, various things could go wrong or opportunities might be missed. For example, if I’m selling coats, and I’m selling a thousand other products, I’m selling coats, and now in a certain area of the country, there is an anomalous weather condition that became cold, all of a sudden I’ll see, I won’t be able to see it because it’s hiding in my data, but people will start buying… in that state will start buying more coats. Now it’s not like if… if somebody actually looked at it, they would probably be able to spot it, but because there is so much data, so many things, so many moving parts, nobody actually notices it. Now our AI system finds… “Oh, there is an anomalous weather condition and there is an uptick in selling that coat, you better do something to seize that opportunity to sell more coats,” so either you have to send more inventory to that region to make sure that if somebody really wants a coat, you’re not out of stock. If you’re out of stock, you’re losing revenue, potential revenue, or you can even offer discounts for that region because you want to bring more people to your e-commerce site, rather than the competition, so, that’s one example…
And I assume it’s also used in security or fraud and what not, or are you really focused on an e-commerce-use case?
So we built a fairly generic platform that can handle a wide variety of use cases. We don’t focus on security as-is, but we do have customers that, in part of their data, we’re able to detect all sort of security-related breaches, like bot activity happening on a site or fraud rings—not the individual fraud of an individual person doing a transaction—but, it’s a lot of the time, frauds are not just one credit card, but somebody actually doing it over time, and then you can create or you can identify those fraud rings.
Most of our use cases have been around more the business-related data, either in ecommerce, ad tech companies, online services. And so online services, anybody that is really data-dependent to run their business, and very data-driven in running their business, and most businesses are transforming into that, even the old-fashioned businesses are transforming into that, because that data has competitive advantage, and being able to process that data to find all the anomalies, gives you an even larger competitive advantage.
So, last question: You made a comment earlier about freeing up people so we can focus on living in the stars. People who say that are generally science fiction fans I’ve noticed. If that is true, what view of the future, as expressed in science fiction, do you think is compelling or interesting or could happen?
That’s a great question. I think that that, what’s compelling to me about the future, really, is not whether we live in the stars or not in the stars, but really about having to free up our time to thinkabout stars, to thinkabout the next big things that progress humanity to the next levels, to be able to explore new dimensions and solve new problems, that…
Seek out new life and new civilizations…
Could be, and it could be in the stars, it could be on Earth, it could be just having more time, having more time on your hands, gives you more time to think about “What’s next?” When you’re busy surviving, then you don’t have any time to think about art, and think about music, and advancing it, or think about the stars, or think about the oceans, so, that’s the way I see AI and technology helping us—really freeing up our time to do more, and to use our collective intelligence and individual intelligence to imagine places that we haven’t thought about before… Or we don’t have time to think about before because we’re busy doing the mundane tasks. That’s really for me, what it’s all about…
Well that is a great place to end it, Ira. I want to thank you for taking the time and going on that journey with me of talking about all these different topics. It’s such an exciting time we live in and your reflections on them are fascinating, so thank you again..
Thank you very much, bye-bye.
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.

Voices in AI – Episode 46: A Conversation with Peter Cahill

In this episode, Byron and Peter discuss AI use in consumer and retail businesses.
[podcast_player name=”Episode 46: A Conversation with Peter Cahill” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-05-31-(00-59-50)-peter-cahill.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/05/voices-headshot-card-4.jpg”]
Byron Reese: This is Voices in AI, brought to you by GigaOm. I’m Byron Reese, and today, our guest is Peter Cahill. He is the CEO over at Voysis. He holds an undergraduate degree in computer science from the Dublin Institute of Technology and a PhD in the field of computer science text-to-speech from University College, Dublin. Welcome to the show, Peter.
Peter Cahill: Thanks. Looking forward to it.
Well, I always like to start with the question, what is artificial intelligence?
It’s a tough question. I think, as time passes, it’s getting increasingly more difficult to define it. I think some years ago, people would use ‘artificial intelligence’ and essentially pattern matching as kind of meant the same thing. I think in more recent years, as technologies have progressed, sizes of data sets are many times bigger, computer power is obviously a whole lot better as well, as are technologies developing that, I think these days, it can be really hard to draw that line. Some time ago, maybe a year ago, I think, I was chairing a panel on speech synthesis. One of the questions I had for the panelists in general was, in theory, could computers ever speak in a more human way, or in a way better than humans? We’ve seen many of these.
Over time, we’ve seen that computers can do computer vision better than people. Computers can do speech recognition better than people, and it’s always in a certain context and on a certain data set. But still, we’re starting to see computers outperforming people in various cases. I asked this question to the panel, could computers speak better than people? I think one of the panelists, as far as I recall, said that he believed they could, and what would be realized would be that, if a computer could not just sound perfectly human but also could be more convincing than your average person would be, then the computer would speak better than the person. I think on the back of that, to ask what’s the artificial part of artificial intelligence, it does seem that, as time passes and these technologies continue to progress, that really having a good definition on that just becomes increasingly difficult. I’m afraid I don’t have a good definition for you for it. But I think eventually people will just start referring to it as intelligence.
You know, it’s interesting. When Turing put out the Turing test, he was trying to answer the question, “Can a machine think?” Everybody knows what the Turing test is, can you tell whether you’re talking to a person or a computer? He said something interesting. He said that “if the computer can ever get you to pick it 30% or 40% of the time, you have to say it’s thinking.” You have to ask why wasn’t that 50-50? Of course, the question he was asking is not whether or not a computer could think better than a person, but whether they can think at all. But the interesting question is what you just touched on, which is if the computer ever gets picked 51% of the time, then the conclusion is what you just alluded to. It’s better at seeming human than we are. So, do you think in the context of artificial intelligence – and I don’t want to belabor it. But do you think it’s artificial like artificial turf isn’t grass? Is it really intelligent, or is it able to fake it so well that it seems intelligent? Or, do you find anything meaningful in that distinction?
I think there is a chance, as our understanding of how the human brain works and develops, in addition to what people currently call artificial intelligence – as that develops, eventually there may be some overlap. I think even myself and a lot of others don’t really like the term “artificial neural networks,” or neural networks, because they’re quite different to the human brain, even though they may be inspired by how the human brain works. But I wouldn’t be surprised if eventually we ended up at a point of understanding how the human brain works, to the extent that it no longer seems as magically intelligent as it does to us today. I think probably what we will see happening is, as machines get better and better at artificial intelligence, that it may become almost like if something seems too natural or too good, then people would assume maybe that it came from a machine and not a person. Probably a really good example is if you consider video games today, that we have this artificial intelligence in video games, which is really not intelligent at all. For example, if you take a random first-person shooter type of game, where the artificial intelligence is trying to seem very – they make lots of mistakes, they move very slowly. If you really tried to power a modern video game with really state of the art artificial intelligence, the human player wouldn’t stand a chance, just because the AI would be so accurate and so much faster and so much more strategic in what it was doing. I think we’ll see stuff like that across the spectrum of AI, where machines can be really, really good at what they’re doing and, as time passes, they’ll just continuously get better, whereas people are always starting from scratch.
So, working up the chain from the brain – which you said we may get to a point where we understand it well enough that our intelligence looks like artificial intelligence, if I’m understanding you correctly. There’s a notion above it, which is the mind, and then consciousness. But just talking about the mind for a minute, the mind, there’s all this stuff your brain can do that doesn’t seem like something an organ should be able to do. You have a sense of humor, but your liver does not have a sense of humor. Where does that come from? What do you think? Where do you think these amazing abilities of the brain – and I’m not even talking about consciousness. I’m just talking about things we can do. Where do you think they come from, and do you have even a gut instinct? Are they emergent? What are they?
Yes, obviously it would just be a guess, really. But I would think that, if we end up with AIs that are as complex or even more complex and more capable than the human brain, then we’re going to probably see various artifacts on the side of that, which may resemble these types of things you’re talking about right now. I think maybe to some extent, right now, people draw this distinction between AI and intelligence, because the human brain still has so many unknowns about it. It appears to be almost magic in that way, whereas AI is very well-understood, exactly what it’s doing and why. Even if, say, models are too big to really be able to understand exactly why they’re making certain decisions, the algorithms of them are very well understood.
Let me ask a different question. You know, a lot of people I have on the show – there’s a lot of disagreement about how soon we’re going to get a general intelligence. So, let me just ask a really straightforward question, which is some people think we’re going to get a general intelligence soon – 5/10/15 years. Some people think an AGI is as far out as 500 years. Do you have an opinion on that?
Yes, I think as soon as we can put a time on it, it’ll happen incredibly quickly. Right now, today’s technologies are not sufficient to be generally intelligent. But what we’ve seen even in general in AI in recent years is, as now, pretty much every company out there is trying to develop their AI strategy, building out AI teams or working with a lot of other companies that work in AI. I think the number of people working in AI as a field has increased dramatically, and that will cause progress to happen far quicker than it would have otherwise happened.
So, let me ask a different variant of the question which is, do you think we’re on an evolutionary path to build… is the technology evolving where it gets a little better, a little better, a little better, and then one day it’s an AGI? Or like the guest I had on the show yesterday said, “No, what we’re doing today isn’t really anything like an AGI. That’s a whole different piece of technology. We haven’t even started working on that yet?”
Yes, I’d say that’s correct. But the leap – it’s not going to be an iteration of what we currently have. But it may just be a very small piece of technology that we don’t currently have, when combined with everything that we do currently have makes it possible.
Let’s talk about that. People who think that we’re going to get an AGI relatively soon often think that there is a master algorithm, that there is a generalized unsupervised learner we can build. We can just point it at the internet and it’s going to know all there is to know. Then other people say, “No, intelligence is a kludge. Our brains are only intelligent because we do a thousand different things and they’re all cognitive biased. All this messy spaghetti code is all we really are.” You have an opinion on that?
I think currently there’s no algorithms out there that even suggest it could be generally intelligent. I think as it is, even if there was one minor breakthrough in that space, it would have a very dramatic knock-on effect in the world. Then people would start believing it was only a number of years away. As it is right now, if it happened in 5 years, I honestly would not be surprised. If it happened in 15, I wouldn’t be surprised, or if it happened in 50. Right now, we’re at least one major breakthrough away from that happening. But that could happen at any point.
Could it never happen?
In theory, yes. But in practice, I would guess that it will.
One argument that says that it may be, just like you’re suggesting, a straightforward one breakthrough away. It says that the human genome, which is the formula for building a general intelligence – and it does a whole lot other stuff – is, say, 700MB. But the part that is different than, say, a chimp, is just one percent of that, 7MB-ish. The logical leap is that there might just be a small little thing that’s a small amount of code, because even in that 7MB, a bunch of it’s not expressing proteins and all of that. It might just be something really simple. But do you think that that is anything more than an analogy? Is that actually a proof point?
I would expect it to be something along that line. Even today, I think you could take the vast majority of deep learning algorithms and you could represent them all in less than a MB of data. Many of these algorithms are fairly straightforward formula, when they’re implemented in the right way. They do what we currently call deep learning or whatever. I don’t think we’re that many major leaps away from having an artificial general intelligence. Right now, we’re just missing the first step on that path, and once something does emerge, there’s going to be thousands, tens of thousands of people all around the globe who will start working on it immediately, so we’ll see a very quick rate of progress as a result, in addition to it just learning by itself, anyway.
Okay, so just a couple more questions along these lines and then we’ll get back to the here and now. There’s a group of people – and you know all the names – high profile individuals who say that such a thing is a scary prospect, an existential threat, summoning the demon, the last invention. You know all of it. Then you get the other people, Andrew Ng, where it’s worrying about overpopulation on Mars, Zuckerberg who says flat out it’s not a threat. Two questions. Where are you on the fear spectrum, and two, why do you think these people – all very intelligent people – have such wildly different opinions about whether this is a good or bad thing?
I think eventually it will get to a point where it has to become – or at least certain applications of it will have to become a threat or dangerous in some way. There’s nothing on the horizon that – that’s really, again, the path of general intelligence, which nobody has right now. I think eventually it will go that way, as many technologies do. No one really knows how to manage it or handle it. There have been calls by some people to regulate AI in some way, but realistically, AI is a technology. It’s not an industry, and it’s not a product. You can regulate an industry, but it’s very hard to regulate a technology, especially when it’s outside of your own country’s borders. Other countries don’t need to regulate it, and so there’s a very good chance, if it’s going to be developed, it’s probably going to be developed by many countries, not just one, especially within a few years of each other. I’m not, to be sure, even if everybody unanimously agreed, that in 100 years’ time, it was going to become a threat. I’m not too sure that it could be stopped even already, because there’s so many people working on it across different countries all over the world. There’s no regulation in any single country that could stop it. Even right now, regulation isn’t required. The technologies don’t even exist to do it, to begin with.
Let’s talk about you for a minute. Can you bring us up to date? How did Voysis come about? How did you decide to enter into this field? Why did you specialize in text-to-speech? Can you just talk a little bit about your journey?
Sure. I started working in text-to-speech in 2002, so 15 years ago. I think at the time, what really attracted me to it was that it was a very difficult problem. Many people had worked on it for decades, especially back then. Computer voices sounded incredibly robotic, and then even when I looked into it in more detail, what made it even more interesting is many machine learning problems tend to be kind of classification problems, where they didn’t put a large amount of data, and then output a small amount of data in the output. For example, if you’re doing image classification today, the size of data you have in images and is far greater than the final results you get out of the model, which may just tell you this is a picture of a car or something like this. We didn’t put huge amounts of data in output, something that’s very small.
Text-to-speech is the extreme opposite of that, where the amount of input is just a few characters. From that, the system has to generate this human-sounding waveform. In the case of the human-sounding waveform, if even a small amount of that data is slightly off, the human ear will notice it very, very easily, because we’re completely used to listening to human voices, and we’re not used to listening to distorted signals generated by machines. I guess it’s the opposite of the traditional machine learning problem, where it’s kind of being creative, given a very small amount of data and it needs to create a whole lot more. That’s kind of where I started off originally, working on my PhD. After it, I became faculty at the university I was in, and made faculty for several years.
Then eventually, I resigned as faculty to start Voysis, where I think at the time, I had always said I’d like to open a company at some point. I think at that time in particular, we saw the likes of Google, Apple, Microsoft and so on – all of them went on an acquisition spree, and they acquired many of the smaller companies that had this technology, regardless of what country they were from. I think the knock-on side effect of that was that there were pretty much no independent providers anymore. Even what then companies were going to use these platforms for was very consumer-facing applications like we have today, with Google Home and Amazon Echo. But for other businesses out there who want to have a voice interface in their products, where their users can speak directly to their product and interact with them, pretty much the companies who could have provided that, were all acquired by these big platform companies.
That’s really what motivated me to start Voysis. Since then, we’ve built out Voysis as a complete voice AI platform, which normally when we say that, what we mean is that all of the technologies to power these systems – the speech recognition, the text-to-speech, the natural language understanding, the dialogue management and so on – all of the technologies were built in-house, here in Voysis. What we do is we partner with companies and select partners that we feel are both ready for voice, and consumers within that space that will benefit greatly from having a voice interface. When we build out products, we tend to find articles where we do a lot of user studies on how do consumers want to interact with these devices, and build out the whole user experience to deliver really high-quality voice interactions, integrated directly in third-party business products.
Looking at your website, I noticed you have linguists, you have a wide range of specialists in your company, and then watching your demo stuff, it just seems to me that what you’re trying to do, or what the field breaks down into are four things. I think you just ran through them. One of them is emulating human speech. One of them is simply recognizing the word that I’m saying. The third one is understanding those words, and then the fourth one is managing the dialogue of what pronouns are standing for what thing and all of that. Did I miss any of it?
No, I’d say that’s it in a nutshell, although in practice, we don’t really draw a line between recognizing words and understanding. In the case of the Voysis platform, what we do is audio would go in, and after it’s passed through several models, the understanding components come out. We never transcribe it into text first, because it’s an approach that I think many companies are moving away from. If you transcribe it into text first, you tend to accumulate error from speech recognition. When you try to understand it, there’s errors in the transcription and you can never really recover from it.
Got you. But just as underlying technology, I would love to just look at each one of them in isolation. Let’s do that second one first, which is just understanding what I am saying. I call my airline of choice, and I say my frequent flier number, which unfortunately has an A, an H, and an 8 in it.
AAHH88 – you know, that’s not it, and it never gets it. I shouldn’t say that, but if everything’s really quiet, it eventually gets it. Why is it so bad?
There’s probably multiple things at play there. If you’re talking to them over a phoneline, phone signals are generally quite distorted and it makes it much more difficult for speech recognition to work well. But there’s also a very good chance that the speech recognition engine they’re using behind that was a general speech recognition engine built for any random use case, as opposed to one that was designed to work on telephone calls, maybe even with some knowledge of the use cases around where it was going to be used.
Because it only needs to recognize 36 things, right? 26 letters and ten numbers.
Sure, but that speech recognition engine may not have been built to recognize some things, which is probably why it struggles with it. Historically, most companies – not Voysis, but many others – tend to build a single speech recognition engine that they try to use in many different situations, and that’s generally where accuracy tends to really suffer. Because if you don’t build a system with any context on exactly how it’s going to be used, it’s a much more difficult task to do 100 things well than it is to do one. That’s essentially the Achilles’ heel of it.
I guess also, unlike dialogue, it doesn’t get any clues about what the next letter or number should be from anything prior to it, right?
There is that, but I think in that case, if you’re just listing letters and numbers, there’s not that many of them. That should work quite well, I think.
In the sentence, “The cat ran up the…,” there’s a finite number of things the cat can run up. What I don’t get, as an aside, is I call from the same number every time. You would think they would have mastered caller ID by now. Let’s talk a little bit about understanding. Any time I come across a Turing test, like a chat bot, I always ask the same question, which is, “What’s larger: a nickel or the sun?” I haven’t found any system that can answer that question. Why is that?
Generally, the modern technologies that are used for chat bots, I think it’s still relatively immature in comparison to the technologies behind speech recognition and text-to-speech and so on. Chat bots really only work well when they’re custom-designed and custom-built for a particular use case. If you ask them general questions like that, it won’t align closely to what they were trained on or built on. As a result of that, you’ll get random answers, essentially, from it, or it’ll struggle to work. I think the chat bot-type technology is still very immature, because it requires a deeper understanding and a deeper intelligence. Whereas if you designed a chat bot, say, for e-commerce in particular, and if people ask it e-commerce-related queries, modern technologies can handle that extremely well. But once you go outside of the domain it was designed for, it will really struggle, because these technologies are not at that level yet, where they could handle switching like that.
When they do contests to try to evaluate things that might someday pass the Turing test, they’re always highly constrained. Like you’re saying, they always say you can’t ask about all of these different things. Do you believe that, to get a system that I could ask it any kind of question I want and it will answer it, does that require a general intelligence or not? Are we going to be able to kludge that up just with existing techniques on enough data?
I think the problem isn’t really about data. It’s modern techniques aren’t good enough to handle any kind of completely random query a user might say to it. Data helps in certain ways, as does newer technologies that are emerging. That is kind of a general intelligence you’re talking about, where it can understand language, regardless of the use case or context.
You don’t think we are in the process of building that now, to hearken back to the earlier part of our conversation, and that we shouldn’t hold our breath for anything like Jarvis, anything like C3PO, anything like that anytime, maybe for decades?
I would say that I’ve seen nothing that would suggest to me that that’s going to happen any time in the next few years. Normally I do keep up on literature and academic journals and so on. I still review many of them, and there’s nothing on the horizon that I’ve seen that would suggest that. I do think modern technologies are still improving in a more iterative way, where you wouldn’t say something completely random to it. But they’re becoming less rigid. If you think of a way you may interact with a Google Home or Echo or Siri, currently it’s in a very prescribed way, where you need to know what words you can say to it in what order, to make it do what you want it to do. Technologies are getting better at being a bit more fuzzy about that, so people can talk to them in a more natural way. But still, they’re still being designed around certain use cases as opposed to being completely general and being able to handle any kind of request.
Talk to me about dialogue management, that whole thing. Where are we with that? Once you understand the words that the person has said, is that a relatively easy problem to solve, or is that also another one that’s particularly tricky?
I think dialogue is probably the most tricky problem there is right now. What makes dialogue really, really difficult is context. You can collect a very large data set of how people interact with the system, but in all cases, the context could go back several turns. Somebody could have said something ten commands ago or ten sentences ago that’s now become relevant again. I think that general context around dialogue is what makes it quite difficult, whereas for example, with speech recognition, people would generally just consider all sentences are independent. That way, it’s very easy, even if you’re collecting data, it’s very easy to collect the large data set where people are saying loads of sentences. Whereas in the case of dialogue, if you need to have full context prescribed in your data set, everything that happened before, everything that happened after, it just means the task of even collecting data is far more difficult.
Understanding the data is far more difficult. Technologies are developing on that front. I think reinforcement learning, which you’re probably familiar with, looks really promising there. It seems to be developing at a fairly quick pace in that use case. But I think the real key with dialogue and making dialogue systems work well will be people need to talk to them in a more natural way than they currently do, whereas many companies’ current approach to dialogue is about collecting data, train the system, deploy it. For dialogue to work well, I think you need to have dialogue systems that can learn on the fly. As people interact with them, the dialogue systems will learn how to be a better dialogue system, and then maybe after enough interactions, which may initially be bad interactions, but after enough of them, the system will learn and do a much better job.
I think modern technologies can do that. We tend not to see many of them/systems deployed publicly, so again, if you speak to your Amazon Echo or something, it’s really built around having independent instructions that are not really connected to something you said a few sentences ago. You couldn’t have a chat with it. You can just give it a command and tell it what to do. But it doesn’t really come back and interact with you in any meaningful way.
I’ve had a couple of guests on the show from China, who have both said variants of the same thing, which is in China, because you have a much bigger character set to deal with, they’ve had to do voice recognition earlier and put a lot more energy into it. Therefore, they’re ahead in it, compared to other languages. First of all, is that your experience? Are there languages that we do better at it than others? Second, how generalizable is the technology across multiple languages? Like, once you master it in Russian, can you apply that to another language easily?
There’s a few approaches to this. I think the barrier for languages generally tends to be about acquiring good data. Acquiring loads of data is very easy, but you need to have good data. If you’re building a speech recognition system, where you’re expecting people to speak to it via cellphones, you want to record a data set of people speaking in a very similar way, as they would in a deployed application, but speaking through cellphones.
Generally doing that, that’s generally a big manual process that many companies do, where they record maybe tens of thousands of people, maybe more, saying commands through various different cellphone models and they’ll collect all the data, then train off that. That’s generally the barrier. The technologies themselves that are used in the Chinese systems – in Voysis, we do some stuff with Chinese as well. We are quite familiar with it, and the core technologies are all the same.
For speech synthesis, Chinese is a little bit different because it’s a tonal language. The larger character set as well brings in some of its own challenges, as well as in Chinese, they don’t have space characters between words. When you get a string of Chinese text, the first thing you need to figure out is: what are the words here? Where do you insert the spaces? For speech recognition, the technology stack is essentially the same.
We acquired language 100,000 years ago. Just talking about English, you know the whole path and how it got to where it is. What are things about English that make it uniquely difficult? Is it homophones? Is it…?
I think the biggest challenge with English is that the written form of English and how it’s pronounced aren’t really as well connected as many native speakers think they are. Whereas for many languages, if you see how a word is spelled, it’s very easy to predict how it’s going to be pronounced, whereas in English, that’s not really the case at all. There’s quite a lot of words that come from influences of different languages, be it from French or wherever else. I think as it is in English today, even calculating how to pronounce words remains still quite a big academic problem. People try to fight it with large data sets, where how every word is pronounced is still kind of specified manually, when the system’s being built. Whereas for many other languages, including Chinese, once you have the written form, you can generally quite easily calculate how would that be pronounced.
Then, talk to me about the fourth leg on this table, which is voice emulation. You had said that there’s kind of an uncanny valley effect, that if it’s just a little bit off, it sounds wrong.
Yes, these systems generate audio and they do it where their intention is that the audio will contain a speech signal and nothing else. But in practice, they’re generating audio. Any errors in that generation may result in random noises in the audio, glitches or other things. It may be distortions, maybe it’ll mispronounce a word, where in certain cases, changing a single sound in a word can change the meaning of a sentence very dramatically.
Also, for them to do a good job, they really need to understand the meaning of the words they’re saying, whereas if you’re just pronouncing words on their own without any understanding of the meaning, it will result in a speech signal that could sound very humanlike. But at the same time, native speakers will notice that something sounds just a bit off about it. It’s not delivered in a very natural way.
How do you solve that problem long-term? What are the best practices?
Currently, the best way to approach it is if you’ve got a good understanding of where that system is going to be used – again, not a one size fits all system, but you know maybe in a certain case, you want to be able to generate computer voices that will say things similar to what a store assistant may say. Generally, in that case, it makes a lot of sense to record a data set of things a store assistant would say, maybe even record a store assistant while they’re working so you can see what kind of prose it is they use.
Then from that, you build your AI with the knowledge of this is how a human in this situation would speak to someone, whereas traditionally, even now, for many of the computer voices we hear today, many of them are kind of close to being pre-recorded where they would have tens of thousands of audio clips recorded in advance, and they’re kind of stitching the words together. But even when they record the audio, they’re recording it with the use case in mind. If it’s a voice on a GPS system, like a sat nav, the audio it speaks to you with, that was trained off audio recordings of people reading sat nav-type instructions. But in that case, it can sound quite natural and it can sound quite good.
With those four technologies, the ability to recognize words, to understand them, to manage the dialogue, and to emulate voice, let’s say we get really good at all of them. Let’s say we get really good at them. I can think of probably three cases off the top of my head, or three ways that can be terribly misused. I’m sure you can think of more. But if we can go through each of them, I would appreciate getting your thoughts on them. The first is of course privacy. When you think about all the cellphone traffic in the world, most of us are lucky because there’s so much data that nobody can listen to all the conversations. Now, somebody can listen to all the conversations, understand them, interpret them, and so forth. I assume you agree that that is a potential misuse. What are your thoughts on it?
Yes, absolutely. I mean, I think even going back 20, 30 years, government agencies did tend to fund a lot of the university research in speech recognition. I assume use cases like that may have been what they had in mind. I think it also touches on this point of many cases where AI adds real value is that it can just scale far more than people, where you could have an AI that can transcribe all the content of all calls that are happening right now. Again, I imagine in certain parts of the world, that type of system is probably in place. I guess I don’t think there’s much that we can really do about it. It’s kind of inevitable, I think. At some point, it’s going to just become normal, if it isn’t already.
Then the second one is, I came across the site where you could type in dialogue and you could pick – in this particular case, it could be said in Hillary Clinton’s voice, or Donald Trump’s voice. You knew it wasn’t them, clearly. But it was kind of interesting, because all you have to do is say there’s a Moore’s law and it’ll be twice as good, twice as good, twice as good, twice as good. Then all of a sudden, hearing isn’t believing anymore. The whole fake news aspect of it, what do you think about that?
Yes, to really do that well, current technologies can’t do that well. There’s only two companies in the world that have that capability, as far as I know. One of them is Google, and the other is Voysis. It uses a technology called WaveNet. I’m not sure if you’re familiar with it already, but if you search for it and you come across some great examples of it, it will sound very, very convincingly human, particularly if you’re just reading a sentence. If you need to read longer amounts of text, then you hit this odd moment I mentioned earlier, where it sounds like the system doesn’t really understand what it’s saying.
But it will sound very convincingly human and far better than the samples you were referring to, of the Hillary Clinton voices and so on. That technology does exist today, and naturally there’s security concerns with that type of technology. Obviously if it fell into the wrong hands, people could make phone calls with the identity of somebody else, which could obviously have a dramatic impact on various things, be it at corporate level or government level. Again, I think this is a side effect of AI in general, that we’re going to see machines being better or as good as people at doing various tasks.
You think that’s also inevitable?
I think it’s already there.
When my Dad calls me and asks me my PIN number or whatever, I’ll be like, “I don’t know, what did you get me for my ninth birthday?” Let me ask of you, if somebody gave you a piece of audio that they recorded and said, “Can you figure out if this is a human or a computer,” could you figure it out? Or, could you imagine a tool that, no matter how good it gets, could still tell that it was not real audio, not a human?
I was having a chat with some professors about this exact question about two weeks ago. Everyone at the table unanimously agreed that that’s not possible, in our opinions. I know there’s a very big voice biometric industry right now, but I don’t really believe that computers can generate signals that will successfully bypass human systems.
I’m just going to let that sink in for a minute.
Do a Google search for WaveNet, if you’re not familiar with it. You’ll see some audio samples from both Google and Voysis, and the Voysis audio samples do sound very convincingly human. They can be used to mimic people’s voices as well.
Well, the interesting thing is, if you ask it about an image, we can do a pretty good job of… you take a photograph and can you tell if this was generated entirely by a machine or if it’s actually photographed? There’s all kinds of nuance in it, and gradients. There’s so many clues internal to it. Are you saying that there isn’t an equivalent richness to speech, you just don’t have as many dimensions of light and color and shadow and all of that, or are you saying no, even with video and images?
I mean, image is a lot easier than video. I would think, if you got one of the stronger AI teams in the world today and asked them to build a system that would produce convincing images in that sense, certainly there’s several teams out there that could do it. Video tends to be a lot more difficult, just because of the complexity of it, where video is essentially hundreds or thousands of images. I’d say the challenge or the barrier there is probably more computer power than any technologies, the lack of technology, for example.
My third question, my third area of concern is a topic I bring up a lot on the show, which is Weizenbaum and ELIZA. Back in the 60s, Weizenbaum made this program called ELIZA that was a really simple chat bot. You would tell it you were having problems, and it would ask you very rudimentary questions. Weizenbaum saw these people get emotionally attached to it, and he pulled the plug on it. He said, “Yeah, that’s wrong. That’s just wrong.” He said, “When the computer says, ‘I understand,’ it’s just a lie because there’s no ‘I’ and there’s nothing that understands anything.” Do you think it’s a concern, that when you can understand perfectly, you can engage in complex dialogue the way you’re talking about, and it can sound exactly like a human, that Weizenbaum’s worst fears have kind of come about? We haven’t really ennobled the machines, because it’s just still a lie. Do you have any concerns about that or not?
The way I look at it is I think when the day comes where, when these systems can speak and understand and interact with people in many languages in a very human and natural way, it will improve the lives of billions of people on the globe. Some people, particularly people who don’t need the technologies, may say they’d rather not use it or may not like speaking to a piece of plastic, essentially, as if it’s a person. But right now, for many people in the world, access to information is still a huge problem, much more so if you look in many developing countries.
I think even in India, they have over 1,100 languages. Even if certain people go to a doctor, they may not speak the language that the doctor speaks. There’s many communication problems globally. These technologies will dramatically improve the lives of so many people. People who don’t want to speak to these devices, as if they’re human, don’t have to. I think there’s probably more benefits than cons, on that front.
Well, just taking a minute with that, obviously I’m not talking anything about, “Oh, we don’t want people in India to understand other…,” nothing like that. If you look to science fiction, you have three levels. You have C3PO, and he just talked like a person. It was just Anthony Daniels talking. Then you get Star Trek, with Commander Data. It’s Brent Spiner, but he deliberately acted in a way where Data didn’t use contractions. He didn’t have emotion in his voice, but it was still human. Then you think of something like innumerable examples, like Buck Rogers in the 25thCentury, Twiggy, and it was clearly a mechanical voice. All three of them would solve your use case of understanding. The question is twofold. Do you have a feeling on which one of those, long-term, people will want? Will people want to always know they’re speaking to a machine?
Yes, I think so. In my opinion, people want communication to be frictionless or effortless. It shouldn’t feel that you need to concentrate hard on what’s the machine trying to say to me. Did it understand me or not? These types of things. If you have a machine that speaks in a very natural and almost humanlike way, I think many people would like it to have some artifact there that makes them aware that it’s actually a machine.
Where does that leave you with the technology that you’re building, that you said is trying to get that last one percent to sound like a human? What’s the use case for that? What’s the commercial demand for it?
I think right now, you have many of the computer voices we hear from various products that are out there are incredibly robotic. Those take quite a lot of effort to listen to them, especially if you try to listen to something like an audiobook. They tend to be very monotonous and almost it’s tiring to listen to them. That’s really what this technology addresses. It’s not that it has to be deployed in a way where it sounds convincingly human. It just can be deployed that way. If people have a preference to listen to it in a way where it has something in the signal where – it shouldn’t be tiring to listen to. They can do that. There’s no technology barrier to doing that, even today.
I mentioned the uncanny valley earlier, which is you don’t want your drawings of people to look just one notch below perfect, otherwise they look grotesque, that you definitely want to dial them down several notches. Is there an equivalent in audio in your mind that, if it’s just a little bit too close – or do you think it ought to go as far as it can, if that’s what people want right now, or that it should get to 95% and stop, if that’s more what people want right now?
I think the way machines will speak will always be different. But it doesn’t mean they shouldn’t sound natural. For example, when you and I talk, there’s plenty of times with, say, fillers like, “mmm,” “uh,” these different noises that also makes our speech natural, whereas for machines, there’s no need for them to do that. They can control even the speaking rate and various things that would make them not be speaking naturally in the human sense.
But they could still be speaking in a way that’s very easy for anyone to follow, very easy to understand, engaging. They don’t need to always sound – like today, many of these systems, especially the older generation ones, many of them do sound incredibly robotic. They’re tiring to listen to, or it takes quite a lot of effort. People listen to them when they have no other choice, really, whereas with the new wave of technology with WaveNet, it’s enabling these systems just to sound just much nicer to listen to.
If you take something like a soliloquy from Shakespeare, something like, “Friends, Romans, countrymen, lend me your ears. I have come to bury Caesar, not to praise him. The evil that men do lives after them the good is oft interred with their bones.” When I say that as a human, I’m emphasizing words. I’m stretching words out. I’m making other words fast, I’m inserting pauses. Is that what you’re talking about? Do you think you’ll get to a point where you could feed it that passage and it would do an equivalent reading, and not even worrying about if the tonal quality’s perfect? But, could it do all of that other stuff I just did?
Forbes published an article with some audio samples from our WaveNet system that did exactly that, although it was reading Black Beauty, just reading maybe the first 20, 30 sentences of Black Beauty.These systems can sound quite natural, but the system that did that, which did sound very natural, it was trained off audio of somebody reading audiobooks. It wasn’t a general system that could be used for different cases. I think current systems still need the training data to be quite close to the application. Otherwise, the level of naturalism diminishes very, very quickly.
How do you do that? I mean, I know we can’t understand it, especially in the context of this. But how do you do it? Is it word pairs that you’re looking for, or are words classified by their definition, whether they’re angry words? How does it work?
So, in practice, I think we’ve found out, within a certain domain – if you take audiobooks, for example, the way a single person would read a book, there’s various patterns around how they express certain things. The system itself needs to consider more than the sentence. It can’t just be reading individual sentences, which again is what many of these modern systems do. It needs to really think in terms of paragraph or in terms of the overall context with which it’s working within. Certainly predicting pauses or breath-type sounds, these systems will do that quite naturally as-is.
I think in the case of books, it’s probably more about timing than anything else. The pitch is, I think, probably easier in books than it is in certain bits of pitch, at least easier in books than they are in other domains, I think. If you haven’t heard of it, I’d highly recommend you to have a listen to the sample we published, or Forbes published, of our WaveNet system. It is reading a book, so it is really the exact use case you’re talking about here. We got great feedback from people, saying how eerily human it sounded, I think was the term Forbes used.
I assume eerily in the sense that the technology is eerie, not that it sounded eerie.
There are audio clips of J. R. R. Tolkien reading from his writings. I think there’s Hemingway reading some part of what he wrote. I think it would be great to hear Hemingway readThe Old Man and the Sea. How much Hemingway reading something he wrote would you need to make a convincing Hemingway reading The Old Man and the Sea? Is it a minute? Is it an hour?
Oh, it’s a lot more. To do it really well with current technologies, you need a lot more data. The one we published used 10 hours of one person reading, which I think was maybe a bit over two audiobooks.
So, if somebody had an unabridged recording of The Fellowship of the Ring, then The Two Towers, and they died, you could make a passable Return of the King?
Oh, absolutely yes. I think that data requirement will just go down over time, but currently 10 hours is the entry level, I think.
Legally speaking, who owns that? Right now, what would be the state of the art, either in Ireland where you are, or anywhere you know of?
It’s a good question. I think voice talents constantly encounter this, where in many ways, even if you pay a voice talent to record an audiobook, for example, the audio recordings do contain that person’s identity, to some extent. I think it’s very hard to classify who actually owns audio in that sense, when the audio is the other person speaking and it does contain their identity, just like if someone takes a photo of you or I. We can probably have some kind of entitlement of claiming ownership over it, if it is a photo of us, regardless of whatever payments were made. There’s some legal grey area, but that’s not got to do with AI technologies. That’s even for if you’re recording a radio commercial. It’s a legal grey area, too, as to how much of the audio recordings can you own, when it’s clearly someone’s identity? You can’t really own someone’s identity.
Right. I guess the question at law, which somebody will have to decide at some point, is if you pay somebody for a recording and then you own that recording, presumably you own all of the derivative things you – I mean, like you said, it’s a grey area. We don’t know, and I’m sure regulators in case law will eventually sort it out.
I wouldn’t be surprised if we ended up in a world where maybe celebrities could do endorsements of audio clips for radio or for various other things, where the audio is completely generated by a machine, where the celebrity didn’t need to go to a recording studio for a day to record that audio. I think that day is not that long away.
You know, it’s a world with lots of questions. I was just reading about this company that takes old syndicated TV shows and figures out ways to insert modern product placement in them. Then they can go sell that. Isn’t that something? All of a sudden – this isn’t a real example, but you could have Lucy drinking a Red Bull in I Love Lucy, or something like that, right? It’s all ones and zeroes, at some level. I gave three areas this technology could be misused that just came to me. There’s the fake news, there’s invasion of privacy, and there’s this dehumanizing Weizenbaum ELIZA aspect. What did I miss?
I think the general concern I’ve heard in academic circles always tends to be about privacy. You kind of covered that one. Nothing springs to mind.
Talk to us a minute. You have a platform that people can use. What I’ve noticed you emphasizing over and over is the platform needs to be trained to a purpose. If you’re a tennis shoe company, it needs to be taught with tennis shoe content, about tennis shoe-related issues. They’re all highly verticalized, or they have to be customized. I assume that’s the case. If so, what does that process look like and then, where are you on your product trajectory? What are you going to do next? How are you going to wow us in a year, when you come back on the show?
Yes, on the website we’re talking about this new product that we launched two weeks ago now, in New York, called Voysis Commerce. The way it really works is we’ve built out the whole commerce use case, through user studies and building up an understanding of what the consumers actually want to say to a retailer or a brand while they’re looking at their website or mobile app. We build out that use case in a way where today, any retailer or brand can just take their product catalog, which is the names of their products, whatever descriptions they have from the product pages on the products and they upload it to us.
Then, fully automatically, in a matter of hours, a voice AI is created, which knows what products they sell. It’s learned from the natural language descriptions on the product pages, about how their products are described. When a user comes along and says, “I want red tennis shoes with certain features on them,” the user can just say that using completely natural language and get relevant search results.
Then I think where it gets really interesting is when the user does get relevant searches on the screen in front of them, they can do a refinement query. They can just do maybe a follow-up query where they’re adding more details about what they’re looking for. Maybe they do their initial search for tennis shoes or whatever they’re looking for. When they see the search results on the screen, then they can say, “Actually, I only want to spend about $50. What have you got around that price?”
Again, the search results will be updated and they can continuously just provide more and more details, maybe change their mind on certain details. They could say, “What if I was to increase my budget by $50? What would the products be then?” They can just interact with it in this far more powerful way than what people are used to, with keyword-based search.
I think one of the side effects then, for the retailers, is that they get a much better understanding of what their customers are actually looking for, what their customers want. Currently, many retailers in e-commerce brands are doing a lot of data analytics. But really, what they’re analyzing is what keywords have people searched into a box or what buttons have they clicked on, whereas natural language obviously is not constrained. They can get a lot of value out of understanding their customers better and, in turn, provide a much better experience to the customers as well.
Fantastic. I’m going to assume I really am speaking to the real Peter Cahill, that it’s not somebody else at the company using the mimic thing, and this will be in the next Forbes article.
That’s a good idea. We should do that at some point.
Somebody can do all these for you. If people want to keep up with you personally and what your company is doing, can you just run down that?
Yes. Both me and the company are quite active on Twitter, so it’s @Voysis on Twitter, or @PeterCahill, on Twitter. Obviously if anyone ever wants to drop me a mail, please do. You can reach me at [email protected].
Voysis is V-O-Y-S-I-S?
All right Peter, I want to thank you so much for taking the time to chat with us about this very fascinating topic.
Yes, thank you. I enjoyed it.
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.

Voices in AI – Episode 43: A Conversation with Markus Noga

In this episode, Byron and Markus discuss machine learning and automation.
[podcast_player name=”Episode 43: A Conversation with Markus Noga” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-05-22-(00-58-23)-markus-noga.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/05/voices-headshot-card.jpg”]
Byron Reese: This is Voices In AI brought to you by GigaOm, I’m Byron Reese. Today, my guest is Marcus Noga. He’s the VP of Machine Learning over at SAP. He holds a Ph.D.in computer science from Karlsruhe Institute of Technology, and prior to that spent seven years over at Booz Allen Hamilton working on helping businesses adopt and transform their businesses through IT. Welcome to the show Markus.
Markus Noga: Thank you Byron and it’s a pleasure to be here today.
Let’s start off with a question I have yet to have two people answer the same way. What is artificial intelligence?
That’s a great one, and it’s sure something that few people can agree on. I think the textbook definition mostly defines that by analogy with human intelligence, and human intelligence is also notoriously tricky and hard to define. I define human intelligence as the ability to deal with the unknown and bring structure to the unstructured, and answer novel questions in a surprisingly resourceful and mindful way. Artificial intelligence in itself is the thing, rather more playfully, that is always three to five years out of reach. We love to focus on what can be done today—what we call machine learning and deep learning—that can draw a tremendous value for businesses and for individuals already today.
But, in what sense is it artificial? Is it artificial intelligence in the way artificial turf? Is it really turf, it just looks like it? Or is it just artificial in the sense that we made it? Or put another way, is artificial intelligence actually intelligent? Or does is it just behave intelligently?
You’re going very deep here into things like Searle’s Chinese room paradox about the guy in the room with a hand for definitions of how to transcribe Chinese symbols to have an intelligent conversation. The question being who or what is having the intelligent conversation. Is it the book? Certainly not. Is it the guy mindlessly transcribing these symbols? Certainly not? Is it maybe the system of the guy in the room, the book, and the room itself that generates these intelligent seeming responses? I guess I’m coming down on the output-oriented side here. I try not to think too hard about the inner states or qualia, or the question whether the neural networks we’re building have a sentient experience or the experience in this qualia. For me, what counts is whether we can solve real-world problems in a way that’s compatible with intelligence. Its place in intelligent behavior of everything else—I would leave to the philosophers Byron.
We’ll get to that part where we can talk about the effects of automation and what we can expect and all of that. But, don’t you think at some level, understanding that question, doesn’t it to some degree inform you as to what’s possible? What kinds of problems should we point this technology at? Or do you think it’s entirely academic that it has no real-world implications?
I think it’s extremely profound and it could unlock a whole new curve of value creation. It’s also something that, in dealing with real-world problems today, we may not have to answer—and this is maybe also something specific to our approach. You’ve seen all these studies that say that X percent of activities can be automated with today’s machine learning, and Y percent could be automated if there are better natural language speech processing capabilities and so on, and so forth. There’s such tremendous value to be had by going after all these low-hanging fruits and sort of doing applied engineering by bringing ML and deep learning into an application context. Then we can bide our time until there is a full answer to strong AI, and some of the deeper philosophical questions. But what is available now is already delivering tremendous value, and will continue to do so over the next three to five years. That’s my business hat on—what I focus on together with the teams that I’m working with. The other question is one that I find tremendously interesting for my weekend and unique conversations.
Let me ask you a different one. You started off by saying artificial intelligence, and you dealt with that in terms of human intelligence. When you’re thinking of a problem that you’re going to try to use machine intelligence to solve, are you inspired in any way by how the brain works or is that just a completely different way of doing it? Or do we learn how intelligence, with the capital I, works by studying the brain?
I think that’s the multi-level answer because clearly the architectures that do really well in analytic learning today are in a large degree neurally-inspired. Instead of having multi-layered deep networks—having them with a local connection structure, having them with these things we call convolutions that people use in computer vision, so successfully—it resembles closely some of the structures that you see in the visual cortex with vertical columns for example. There’s a strong argument for both these structures in the self-referential recurrent networks that people use a lot for video processing and text processing these days are very, very deeply morally inspired. On the other hand, we’re also seeing that a lot of the approaches that make ML very successful today are about as far from neutrally-inspired learning as you can get.
Example one, we struggled as a discipline with neutrally-inspired transfer functions—that were all nice, and biological, and smooth—and we couldn’t really train deep networks with them because they would saturate. One of the key enablers for modern deep learning was to step away from the biological analogy of smooth signals and go to something like the rectified linear unit, the ReLU function, as an activation, and that has been a key part in being able to train very deep networks. Another example when a human learns or an animal learns, we don’t tend to give them 15 million cleanly labored training examples, and expect them to go over these training examples 10times in a row to arrive at something. We’re much closer to one-shot learning and being able to recognize the person with a cylinder hat on their head just the basis of one description or one image that shows us something similar.
So clearly, the approaches that are most successful today are both sharing some deep neural inspiration as a basis, but, also a departure into computationally tractable, and very, very different kinds of implementations than the network that we see in our brains. I think that both of these themes are important in advancing the state-of-the-art in ML and there’s a lot going on. In areas like one-shot learning, for example, right now I’m trying to mimic more of the way the human brain—with an active working memory and these rich associations—is able to process new information, and there’s almost no resemblance to what convolutional networks and the current networks do today.
Let’s go with that example. If you take a small statue of a falcon, and you put it in a hundred photos—and sometimes it’s upside down, and sometimes it’s laying on its side, sometimes it’s half in water, sometimes it’s obscured, sometimes it’s in shadows—a person just goes “boom boom boom boom boom” and picks them out, right and left with no effort, you know, one-shot learning. What do you think a human is doing? It is an instance of some kind of transfer learning, but what do you think is really going on in the human brain, and how do you map that to computers? How do you deal with that?
This is an invitation to speculate on the topics of falcons, so let me try. I think that, clearly, our brains have built a representation of the real world around us, because we’re able to create that representation even though the visual and other sensory stimuli that reach us are not in fact as continuous as they seem. Standing in the room here having the conversation with you, my mind creates the illusion of a continuous space around me, but in fact, I’m getting distinct feedbacks from the eyes as they succumb and jump around the room. The illusion of a continuous presence, the continuous sharp resolution of the room is just that; it’s an illusion because our mind has built very, very effective mental models of the world around us, that’s highly contrasting information and make it tractable on an abstract level.
Some of the things that are going on in research right now [are] trying to exploit these notions, and trying to use a lot of unsupervised training with some very simple assumptions behind them; basically the mind doesn’t like to be surprised, and would, therefore, like to predict what’s next [by]leveraging very, very powerful unsupervised training approaches where you can use any kind of data that’s available, and you don’t need to enable it to come up with these unsupervised representation learning approaches. They seem to be very successful, and they’re beating a lot of the traditional approaches because you can have access to way larger corpuses of unlabeled information which means you can train better models.
Now is that it a direct analogy to what the human brain does? I don’t know. But certainly it’s an engineering strategy that results in world-leading performance on a number of very popular benchmarks right now, and it is, broadly speaking, neutrally-inspired. So, I guess bringing together what our brains do and what we can do in engineering is always a dance between the abstract inspiration that we can get from how biology works, and the very hard math and engineering in getting solutions to train on large-scale computers with hundreds of teraflops in compute capacity and large matrix multiplications in the middle. It’s advances on both sides of the house that make ML advance rapidly today.
Then take a similar problem, or tell me if this is a similar problem, when you’re doing voice recognition, and there’s somebody outside with the jackhammer, you know, it’s annoying, but a human can separate those two things. It can hear what you’re saying just fine, but for a machine, that’s a really difficult challenge. Now my question to you is, is that the same problem? Is it one trick humans have like that that we apply in a number of ways? Or is that a completely different thing that’s going on in that example?
I think it’s similar, and you’re hitting onto something because in the listening example there are some active and some passive components going on. We’re all familiar with the phenomenon of selective hearing when we’re at a dinner party, and there are 200 conversations going on in parallel. If we focus our attention on a certain speaker or a certain part of the conversation, we can make them stand out over the din and the noise because their own mind had some prior assumptions as to what constitutes a conversation, and we can exploit these priors in our minds in order to selectively listen in to parts of the conversation. This has partly a physical characteristic, maybe hearing in stereo. Our ears have certain directional characteristics to the way they pick up certain frequencies by turning our head the right way and inclining it the right way. We can do a lot already [with] stereo separation, whereas, if you have a single microphone—and that’s all the signal you get—all these avenues would be closed to you.
But, I think the main story is one about signals superimposed with noise—whether that’s camera distortions, or fog, or poor lighting in the case of the statue that we are trying to recognize, or whether it’s ambient noise or intermittent outages in the sense of the audio signal that you’re looking into. The two different most popular neutrally-inspired architectures on the market right now, [are] the convolutional networks for a lot of things in the image and also natural text space, and the recurrent networks for a lot of things in the audio ends at time series signal, but also on text space. Both share the characteristics that they are vastly more resilient to noise than any hard-coded or programmed approach. I guess the underlying problem is one that, five years ago, would have been considered probably unsolvable; where today with these modern techniques, we’re able to train models that can adequately deal with the challenges if the information is in the solid state.
Well, what do you think when the human hears, at a conversation at the party to go with that example, and you kind of like, “Oh, I want to listen to that.” I heard what you say that there’s one aspect of you where you make a physical modification to the situation, but what you’ve also done is introduced this idea of consciousness, that a person selectively can change their focus and that aspect of what the brain is doing, where it’s like, “Oh, wait a minute.” Maybe something that’s hard to implement on a machine, or is that not the case at all?
If you take that idea, and I think in the ML research and engineering communities this is currently most popular under the label of attention, or attention-based mechanisms, then certainly this is all over leading approaches right now—whether it’s the computer vision papers from CVPR just last week or whether it’s the text processing architectures that return state-of-the-art results right now. They all start to include some kind of attention mechanism allowing you to both weigh outputs by the center of attention, and also to trace back results to centers of extension, which have two very nice properties. On the one hand attention mechanisms, nascent as they are today, help improve the accuracy of what models can deliver. On the second hand, the ability to trace back on the outcome of a machine learning model to centers and regions of attention in the input can do wonders for explain-ability of ML and AI results, which is something that increasingly users and customers are looking for. Don’t just give me any result which is as good as my current process, or hopefully a couple of percentage points better. But, also helped me build confidence in this by explaining why things are being classed or categorized or translated or extracted the way they are. To gain the human trust into operating system of humans and machines working together explain-ability future is big.
One of the peculiar things to me, with regard to strong AI—general intelligence—is that there are folks who say, when you say, “When will we get a general intelligence, “the soonest you ever hear is five years. There are very famous people who believe we’re going to have something very soon. Then you get the other extreme is about 500 years and that worrying about that is like worrying about overpopulation on Mars. My question to you is why do you think that there’s such a wide range in terms of our idea of when we may make such a breakthrough?
I think it’s because of one vexing property of humans and machines is that the things that are easiest for us humans tend to be the things that are hardest for machines and vice versa. If you look at that today, nobody would dream of having computer as a job description. That’s a machine. If you think back 60-70 years, computer was the job description of people actually doing manual calculations. “Printer” was a job description, and a lot of other things that we would never dream of doing manually today were being done manually. Think of spreadsheets potentially the greatest simple invention in computing, think of databases, think of things like enterprise resource planning systems that SAP does, and business networks connecting them or any kind of cloud-based solutions—what they deliver is tremendous and it’s very easy for machines to do, but it tends to be the things that are very hard for humans. Now at the same time things that are very easy for humans to do, see a doggie and shout “doggie,” or see a cat and say “meow” is something that toddlers can do, but until very, very recently, the best and most sophisticated algorithms haven’t been able to do that part.
I think part of the excitement around ML and deep learning right now is that a lot of these things have fallen, and we’re seeing superhuman performance on image classification tasks. We’re seeing superhuman performance on things like switchboard voice-to-text transcription tasks, and many other elements are falling to machines that that used to be very easy for humans but are now impossible for us. This is something that generates a lot of excitement right now. I think where we have to be careful is [letting] this guide our expectations on the speed of progress in following years. Human intuition about what is easy and what is hard is traditionally a very, very poor guide to the ease of implementation with computers and with ML.
Example, my son was asking me yesterday, “Dad, how come the car can know where it is at and tell us where to drive?” And I was like, “Son, that’s fairly straightforward. There are all these satellites flying around, and they’re shouting at us, ‘It’s currently 2 o’clock and 30 seconds,’ and we’re just measuring the time between their shouts to figure out where we are today, and then that gives us that position on the planet. It’s not a great invention; it’s the GPS system—it’s mathematically super hard to do for a human with a slide rule; it’s very easy to do for the machine.” And my son said, “Yeah, but that’s not what I wanted to know. How come the machine is talking to us with the human voice? This is what I find amazing, and I would like to understand how that is built.” and I think that our intuition about what’s easy and what’s hard is historically a very poor guide for figuring out what the next step and the future of ML and artificial intelligence look like. This is why you’re getting those very broad bands of predictions.
Well do you think that the difference between the narrow or weak AI we have now and strong AI, is evolutionary? Are we on the path [where] when machines get somewhat faster, and we get more data, and we get better algorithms, that we’re going to gradually get a general intelligence? Or is a general intelligence something very different, like a whole different problem than the kinds of problems we’re working on today?
That’s a tough one. I think that taking the brain analogy; we’re today doing the equivalent of very simple sensory circuits which maybe can’t duplicate the first couple of dozens or maybe a hundred layers in the way the visual cortex works. We’re starting to make progress into some things like one-shot learning; it’s very nascent in that early-stage research right now. We’re starting to make much more progress in directions like reinforcement learning, but overall it’s very hard to say which if any additional mechanisms are there in the large. If you look at the biological system of the brain, there’s a molecular level that’s interesting. There’s a cellular level that’s interesting. There is a simple interconnection I know that’s interesting. There is a micro-interconnection level that’s interesting. I think we’re still far from a complete understanding of how the brain works. I think right now we have tremendous momentum and a very exciting trajectory with what our artificial neural networks can do, and at least for the next three to five years. There seems to be pretty much limitless potential to bring them out into real-world businesses, into real-world situations and contexts, and to create amazing new solutions. Do I think that really will deliver strong AI? I don’t know. I’m an agnostic, so I always fall back to the position that I don’t know enough.
Only one more question about strong AI and then let’s talk about the shorter-term future. The question is, human DNA converted to code is something like 700 MB, give or take. But the amount that’s uniquely human, compared to say a chimp or something like that is only about 1% difference—only 7 or 8 or 9 MB of code—is what gives us a general intelligence. Does that imply or at least tell us how to build something that then can become generally intelligent? Does that imply to you that general intelligence is actually simple, straightforward? That we can look at nature and say, it’s really a small amount of code, and therefore we really should be looking for simple, elegant solutions to general intelligence? Or do those two things just not map at all?
Certainly, what we’re seeing today is that deep learning approaches to problems like image classification, image object detection, image segmentation, video annotation, audio transcription—all these things tend to be orders of magnitude, smaller problems than what we dealt with when we handcrafted things. The core of most deep learning solutions to these things, if you really look at the core model on the model structure, tends to be maybe 500 lines of code, maybe 1000. And that’s within the reach of an individual putting this together over a weekend, so the huge democratization that deep learning based on big data lends is that actually a lot of these models that do amazing things are very, very small code artifacts. The weight matrices and the binary models that they generate then tend to be as large or larger than traditional programs compiled into executable, sometimes orders of magnitude larger again. The thing is, they are very hard to interpret, and we’re only at the beginning of an explain-ability of what the different weights and the different excitations mean. I think there are some nice early visualizations on this. There are also some nice visualizations that explain what’s going on with attention mechanisms in the artificial networks.
As to explain-ability of the real network in the brain, I think that is very nascent. I’ve seen some great papers and results on things like spatial representations in the visual cortex where surprisingly you find triangle scripts or attempts to reconstruct the image hitting the retina based on reading, with fMRI scans, the excitations in lower levels of the visual cortex. They show that we’re getting closer to understanding the first few layers. I think that even with the 7 MB difference or so that you allude to between chimps and humans spelled out for us, there is a whole set of layers of abstractions between the DNA code and the RNA representation, the protein representation, the excitation of these with methylation and other mechanisms that control activation of genes, and the interplay of the proteins across a living breathing human brain that all of this magnitude of complexity above of the super megabyte, by a certain megabyte difference in A’s and C’s, and T’s, and G’s. We live in super exciting types. We live in times were a new record, and a new development, and a new capability that was unthinkable of a year ago, or let alone a decade ago, is becoming commonplace, and it’s an invigorating and exciting time to be alive. I still struggle to make a prediction from the year to general AI based on a straight-line trend.
There’s some fear wrapped up though as exciting as AI is, there’s some fear wrapped up in it as well. The fear is the effect of automation on employment. I mean you know this, of course, it’s covered so much. There’s kind of three schools of thought: One says that we’re going to automate certain tasks and that there will be a group of individuals who do not have the training to add economic value. They will be pushed out of the labor market, and we’ll have perpetual unemployment, like a big depression that never goes away. Then there’s another group that says, “No, no, no, you don’t understand. Everybody is replaceable. Every single job we have, machines can do any of it.” And then there’s a third school about that says, “No, none of that’s going to happen. The history of 250 years of the Industrial Revolution is that people take these new technologies, even profound ones like electricity and engines, and steam, and they just use them to increase their own productivity and to drive wages up. We’re not going to have any unemployment from this, any permanent unemployment.” Which of those three camps, or a fourth, do you fall into?
I think that there’s a lot of historical precedent for how technology gets adopted, and there are also numbers of the adoption of technologies in our own day and age that sort of serve as reference points here. For example, one of the things that surprised me, truly, is the amount of e-commerce—as a percentage of overall retail market share—[that] is still in the mid to high single digit percentage points according to surveys that I’ve seen. That totally does not match my personal experience of basically doing all my non-grocery shopping entirely online. But it shows that in the 20-25 years of the Internet Revolution, a tremendous value has been created—and the conveniences of having all kinds of stuff at your doorstep with just a single click actually—that has transformed the single-digit percentage of the overall retail market with the transformation that we’ve seen. This was one of the most rapid uptakes in history of new technology that has groundbreaking value, by decoupling evidence and bits, and it’s been playing out over the past 20-25 years that all of us are observing.
So, I think while there is tremendous potential of machine learning in AI to drive another Industrial Revolution, we’re also in the middle of all these curves from other revolutions that are ongoing. We’ve had a mobile revolution that unshackled computers and gave everybody what used to be a supercomputer in their pocket which had an infinite revolution. Before that, we’ve had a client-server revolution and the computing revolution in its own—all of these building on prior revolutions like electricity, or the internal combustion engine, or methods like the printing press. They certainly have a tendency to show accelerating technology cycles. But on the other hand, for something like e-commerce or even mobile, the actual adoption speed has been one that is none too frightening. So for all the tremendous potential that ML and AI bring, I would be hard-pressed to come up with a completely disruptive scenario here. I think we are seeinga technology with tremendous potential for rapid adoption. We’re seeing the potential to both create new value and do new things, and to automate existing activities which continues past trends. Nobody has computer or printer as their job description today, and job descriptions like social-media influencer, or blogger, or web designer did not exist 25 years ago. This is an evolution on a Schumpeterian creative destruction that is going on all over industry, in every industry, in every geography, based on every new technology curve that comes in here.
I would say fears in this space are greatly overblown today. But fear is real the moment you feel it, therefore institutions—like The Partnership on Artificial Intelligence, with the leading technology companies, as well as the leading NGOs, think tanks, and research institutes—are coming together to discuss the implications of AI, and ethics of AI, and safety and guiding principles. All of these things are tremendously important to make sure that we can adopt this technology with confidence. Just remember that when cars were new, Great Britain had a law that a person with a red flag had to walk in front of the car in order to warn all pedestrians of the danger that was approaching. That was certainly an instance of fear about technology, that, on the one hand, was real at that point in time, but that also went away with a better understanding of how it works and of the tremendous value on the economy.
What do you think of these efforts to require that when an artificial intelligence makes a ruling or a decision about you that you have a right to know why it made that decision? Is that a manifestation of the red flag in front of the car as well, and is that something that would, if that became the norm, actually constrain the development of artificial intelligence?
I think you’re referring to the implicit right to explanation on this part of the European Union privacy novella for 2018. Let me start by saying that the privacy novella we’re seeing is a tremendous step forward because the simple act of harmonizing the rules and creating one digital playing field across the hundreds of millions of European citizens, and countries, and nationalities, is a tremendous step forward. We used to have one different data protection regime for each federal state in Germany, so anything that is required and harmonized is a huge step forward. I also think that the quest for an explanation is something that is very human. At the core of us is to continue to ask “why” and “how.” That is something that is innate to ourselves when we apply for a job with the company, and we get rejected. We want to know why. And when we apply for a mortgage and we can offer a rate that seems high to us and we want to understand why. That’s a natural question, it’s a human question, and it’s an information need that needs to be served if we don’t want to end up in a Kafka-esque future where people don’t have a say about their destiny. Certainly, that is hugely important on the one hand.
On the other hand, we also need to be sure that we don’t measure ML and AI to a stricter standard than we measure humans today because that could become an inhibitor to innovation. So, if you ask a company, “Why you didn’t get accepted for that job offer?” They will probably say, “Dear Sir or Madam, thank you for your letter. Due to the unusually strong field of candidates for this particular posting, we regret to inform you that certain others are stronger, and we wish you all the best for your continued professional future.” This is what almost every rejection letter reads like today. Are we asking the same kind of explain-ability from an AI system that is delivering a recommendation today that we apply to a system of humans and computers working together to create a letter like that? Or are we holding them to a much, much higher standard? If it is the first thing, absolutely essential. If it’s the second thing, we got to watch whether we’re throwing out the baby with the bathwater on this one. This is something where we, I think, need to work together to find the appropriate levels and standards for things like explain-ability in AI to fill very abstract sentences like write to an explanation with life that can be implemented, that can be delivered, and that can provide satisfactory answers at the same time while not unduly inhibiting progress. This is something that, with a lot of players focused on explain-ability today, where we will certainly see significant advances going forward.
If you’re a business owner, and you read all of this stuff about artificial intelligence, and neural nets, and machine learning, and you say, “I want to apply some of this great technology in my company,” how do people spot problems in a business that might be good candidates for an AI solution?
I can extort that and turn it around by asking, “What’s keeping you awake at night? What are the three big things that make you worried? What are the things that make up the largest part of your uncertainty, or of your cost structure, or of the value that you’re trying to create?” Looking on end-to-end processes, it’s usually fairly straightforward to identify cases where AI and ML might be able to help and to deliver tremendous value. The use-case identification tends to be the fairly easiest chord of the game. Where it gets tricky is in selecting and prioritizing these cases, figuring out the right things to build, and finding the data that you need in order to make the solution real, because unlike traditional software engineering, this is about learning from data. Without data, you basically can’t sort or at least we have to build some very small simulators in order to create the data that you’re looking for.
You mentioned that that’s the beginning of the game, but what makes the news all the time is when AI beats a person at a game. In 1997 you had chess, then you had Ken Jennings in Jeopardy!, then you had AlphaGo and Lee Sedol, and you had AI beating poker. Is that a valid approach to say, “Look around your business and look for things that look like games?” Because games have constrained rules, and they have points, and winners, and losers. Is that a useful way to think about it? Or are the game things more like AI’s publicity, a PR campaign, and that’s not really a useful metaphor for business problems?
I think that these very publicized showcases are extremely important to raise awareness and to demonstrate stunning new capabilities. What we see in building business solutions is that I don’t necessarily have to be the human world champion in something in order to deliver value. Because a lot of business is about processes, is about people following flowcharts together with software systems trying to deliver a repeatable process for things like customer service, or IT incident handling, or incoming invoice screening and matching, or other repetitive recurring tasks in the enterprise. And already by addressing—it’d be easy to serve 60-80% of these, we can create tremendous value for enterprises by making processes run faster, by making people more productive, and by relieving them of the parts of activities that they regard as repetitive and mind-numbing, and not particularly enjoyable.
The good thing is that in a modern enterprise today, people tend to have IT systems in place where all these activities leave a digital exhaust stream of data, and locking into that digital exhaust stream and learning from it is the key way to make ML solutions for the enterprise feasible today. This is one of the things where I’m really proud to be working for SAP because 76% of all business transactions, as measured by value, anywhere on the globe, are on an SAP system today. So if you want to learn models on digital information that touch the enterprise, chances are it’s either in an SAP system or in a surrounding system already today. Looking for these and sort of doing the intersection between what’s attractive—because I can serve core business processes with faster speed, greater agility, lower cost, more flexibility, or bigger value—and crossing that with the feasibility aspect of “do I have the digital information that I can learn from to build business-relevant functionality today?,” is our overriding approach to identifying things that we built in order to make all our SAP enterprise applications intelligent.
Let’s talk about that for a minute. What sorts of things are you working on right now? What sorts of things have the organization’s attention in machine learning?
It’s really end-to-end digital intelligence on processes, and let me give you an example. If you look at the finance space, which SAP is well-known for, these huge end-to-end processes—like record to report, or things like invoice to record—which really deal end-to-end with what an enterprise needs to do in order to buy stuff and pay for it, and receive it, or to sell stuff, and get paid for it. These are huge machines with dozens and dozens of process steps, and many individuals in shared service environments that otherwise perform the delivering of these services. They see a document like an invoice, for example, it’s just the tip of the iceberg for a complex orchestration and things to deal with that. We’re taking these end-to-end processes, and we’re making them intelligent every step of the way.
When an invoice hits the enterprise, the first question is what’s in it? And today most of the units in shared service environments extract development information via SAP systems. The next question is, “Do I know this supplier?” If they have merged or changed names or opened a new branch, I might not have them in my database. That’s a fuzzy lookup. The next step might be, “Have I ordered something like this?” and that’s a significant question because in some industries up to one-third of spending actually doesn’t have a purchase order. Finding people who have an order of this stuff, all related stuff from this supplier, or similar suppliers in the past, can be the key to figuring out whether we should approve it or not. Then, there’s the question of, “Did we receive the goods and services that this invoice is for?” That’s about going through lists and lists of staff, and figuring out whether the bill of lading for the truck that arrived really contains all the things that were on the truck and all the things that were on the invoice, but no other things. That’s about list matching and list comprehensing, and document matching, and recommending classification systems. It goes on and on like that until the point where we actually put through their payment, and the supplier gets paid for the first invoice that was there.
What you see is a digital process that is enabled by IT systems, very sophisticated IT systems, routine workflows between many human participants today. What you do is we can take the digital exhaust of all the process participants to learn what they’ve been doing, and then put the common, the repetitive, the mind-numbing part of the process on autopilot—gaining speed, reducing cost, making people more satisfied with their work day, because they can focus on the challenging, and the interesting, and the stimulating cases, and increasing customer satisfaction, or in this case supplier satisfaction because they get paid faster. This end-to-end approach is how we look at business processes, and when my ML group and AI do that, we see an order recommender, an entity extractor or some kind of translation mechanism at every step of the process. We work hard to turn these capabilities into scalable APIs on our cloud platform that integrates seamlessly with these standard applications, and that’s really our approach to problem-solving. It ties to the underlying data repository about how business operates and how processes slow.
Did you find that your customers are clear with how this technology can be used, and they’re coming to you and saying, “We want this kind of functionality, and we want to apply it this way,” and they’re very clear about their goals and objectives? Or are you finding that people are still finding their sea legs and figuring out ways to apply artificial intelligence in the business, and you’re more heading to lead them and say, “Here’s a great thing you could do that you maybe didn’t know was possible?”
I think it’s like everywhere, you’ve got early adopters, and innovation promoters, and dealers who actively come with these cases of their own. You have more conservative enterprises looking to see how things play out and what the results for early adopters are. You have others who have legitimate reasons to focus on burning parts of their house right now, for whom this, right now is not yet a priority. What I can say is that the amount of interest in ML and AI that we’re seeing from customers and partners is tremendous and almost unprecedented, because they all see the potential to tag business processes and the way business executes to a complete new level. The key challenge is working with customers early enough, and at the same time working with enough customers in a given setting to make sure that this is not a one-off that is highly specific, and to make sure that we’re really rethinking the process with digital intelligence instead of simply automating the status quo. I think this is maybe the biggest risk. We have tremendous opportunity to transform how business is done today if we truly see this through end-to-end and if we are looking to build out the robots. If we’re only trying to build isolated instances of faster horses, the value won’t be there. This is why we take such an active interest in the end-to-end and integration perspective.
Alright well, I guess just to two final questions. The first is, overall it sounds like you’re optimistic about the transformative power of artificial intelligence and what it can do—
Absolutely Byron.
But I would put that question to you that you put to businesses. What keeps you awake at night? What are the three things that worry you? They don’t have to be big things, but what are the challenges right now that you’re facing or thinking about like, “Oh, I just wish I had better data or if we could just solve this one problem?”
I think the biggest thing keeping me awake right now is the luxury problem of being able to grow as fast as demand and the market wants us to. That has all the aspects of organizational scaling and scaling the product portfolio that we enable with intelligence. Fortunately, we’re not a small start-up with limited resource. We are the leading enterprise software company and scaling inside such an environment is substantially easier than it would be on the outside. Still, we’ve been doubling every year, and we look set to continue in that vein. That’s certainly the biggest strain and the biggest worry that I face. It’s very old-fashioned things; it’s like leadership development that I tend to focus a lot of my time on. I wish I would have more time to play with models, and to play with the technology and to actually build and ship a great product. What keeps me awake is these more old-fashioned things, one of leadership development that matter the most for where we are at right now.
You talked at the very beginning, you said that during the week you’re all about applying these technologies to businesses, and then on the weekend you think about some of these fun problems? I’m curious if you consume science fiction like books or movies, or TV, and if so, is there any view of the future, anything you’ve read or seen or experienced that you think, “Ah, I could see that happening.” Or, “Wow, that really made me think.” Or do you not consume science fiction?
Byron, you caught me out here. The last thing I consumed was actually Valerian and the City of a Thousand Planets just last night in the movie theater in Karlsruhe that I went to all the time when I was a student. While not per se occupied with artificial intelligence, it was certainly stunning, and I do consume a lot of the stuff from the ease of it. It provides a view of plausible futures. Most of the things I tend to read are more focused on things like space, oddly enough. So things like The Three-Body Problem, and the fantastic trilogy that that became, really aroused my interest, and really made me think. There are others that offer very credible trajectories. I was a big fan of the book called Accelerando, which paints a credible trajectory from today’s world of information technology to an upload culture of digital minds and humans colonizing the solar system and beyond. I think that these escapes are critical to cure the hem from day-to-day business, and the pressures of delivering product under a given budget and deadlines. Sort of indulging in them, allows me to return relaxed, and refreshed, and energized on every Monday morning.
Alright, well that’s a great place to leave it, Markus. I’m want to thank you so much for your time. It sounds like you’re doing fantastically interesting work, and I wish you the best.
Did I mention that we’re hiring? There’s a lot of fantastically interesting work here, and we would love to have more people engaging in it. Thank you, Byron.
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.

Voices in AI – Episode 41: A Conversation with Rand Hindi

In this episode, Byron and Rand discuss intelligence, AGI, consciousness and more.
[podcast_player name=”Episode 41: A Conversation with Rand Hindi” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-04-10-(01-00-04)-rand-hindi.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/04/voices-headshot-card-2.jpg”]
Byron Reese: This is “Voices in AI” brought to you by GigaOm, I’m Byron Reese. Today I’m excited our guest is Rand Hindi. He’s an entrepreneur and a data scientist. He’s also the founder and the CEO of Snips. They’re building an AI assistant that protects your privacy. He started coding when he was 10 years old, founded a social network at 14, founded a web agency at 15, and he showed interest in machine learning at 18, and began work on a Ph.D. in bioinformatics at age 21. He’s been elected by MIT Technology Reviewas one of their “35 Innovators Under 35,” and was a “30 Under 30” by Forbes in 2015, is a rising star by the Founders Forum, and he is a member of the French Digital Counsel. Welcome to the show, Rand.
Rand Hindi: Hi Byron. Thanks for having me.
That’s a lot of stuff in your bio. How did you get such an early start with all of this stuff?
Well, to be honest, I think, I don’t have any credit, right? My parents pushed me very young into technology. I used to hack around the house, dismantling everything from televisions, to radios, to try to figure out how these things were working. We had a computer at home when I was a kid and so, at some point, my mom came to me and gave me a coding book, and she’s like, “You should learn how to program the machines, instead of just figuring out how to break it, pretty much.” And from that day, just kept going. I mean you know it’s as if, I was telling you when you were 10, that here’s something that is amazing that you can use as a tool to do anything you ever had in mind.
And so, how old are you now? I would love to work backwards just a little bit.
I’m 32 today.
Okay, you mean you turned 32 today, or you happen to be 32 today?
I’m sorry, I am 32. My birthday is in January.
Okay. When did you first hear about artificial intelligence, and get interested in that?
So, after I started coding, you know I guess like everybody who starts coding as a teenager got interested in hacking security and these things. But when I went to university to study computer science, I was actually so bored because, obviously, I already knew quite a lot about programming that I wanted to take up a challenge, and so I started taking masters classes, and one of them was in artificial intelligence and machine learning. And the day I discovered that it was like, it was mind-blowing. It’s as if for the first time someone had shown me that I no longer had to program computers, I could just teach them what I want them to do. And this completely changed my perspective on computer science, and from that day I knew that my thing wasn’t going to be to code, it was to do AI.
So let’s start, let’s deconstruct artificial intelligence. What is intelligence?
Well, intelligence is the ability for a human to perform some task in a very autonomous way. Right, so the way that I…
But wait a second, to perform it in an autonomous way that would be akin to winding up a car and letting it just “Ka, ka, ka, ka, ka” across the floor. That’s autonomous. Is that intelligent?
Well, I mean of course you know, we’re not talking about things which are automated, but rather about the ability to make decisions by yourself, right? So, the ability to essentially adapt to the context you’re in, the ability to, you know, abstract what you’ve been learning and reuse it somewhere else—all of those different things are part of what makes us intelligent. And so, the way that I like to define artificial intelligence is really just as the ability to reproduce a human intelligent behavior in a machine.
So my cat food dish that when it runs out of cat food, and it can sense that there is no food in it, it opens a little door, and releases more food—that’s artificial intelligence?
Yep, I mean you can consider one form of AI, and I think it’s important to really distinguish what we currently have with narrow AI and strong AI
Sure, sure, we’ll get to that in due time. So where do you say we are when people say, “I hear a lot about artificial intelligence, what is the state of the art?” Are we kind of at the very beginning just doing the most rudimentary things? Or are we kind of like half-way along and we’re making stuff happen? How would you describe today’s state of the art?
What we’re really good at today is building and teaching machines to do one thing and to do it better than humans. But those machines are incapable of second-degree thinking, like we do as humans, for example. So, I think we’ve really have to think about this way: you’ve got a specific task for which you would traditionally have programmed a machine, right? And now you can essentially have a machine look at examples of that behavior, and reproduce it, and execute it better than a human would. This is really the state of the art. It’s not yet about intelligence in a human sense; it’s about a task-specific ability to execute something.
So I have posted an article recently on GigaOm where I have an Amazon Echo and a Google Assistant on my desk, and almost immediately I noticed that they would answer the same factual question differently. So, if I said, “How many minutes are in a year?” they gave me a different answer. If I said, “Who designed the American flag?” they gave me a different answer. And they did so because how many minutes in a year, one of them interpreted that as a solar year, and one of them interpreted that as a calendar year. And with regard to the flag, one of them gave the school answer of Betsy Ross, and one of them gave the answer to who designed the 50-state configuration of the stars. So, in both of those cases, would you say I asked a bad question that was inherently ambiguous? Or would you say the AI should have tried to disintermediate and figure it out, and that is an illustration of the limit you were just talking about?
Well I mean the question you’re really asking here is what would be ground truths that the AI should both have, and I don’t think there is. Because as you correctly said, the computer interpreted an ambiguous question in a different way., which is correct because there are two different answers depending on context. And I think this is also a key limitation of what we currently have with AI, is that you and I, we disambiguate what we’re saying because we have cultural references—we have contextual references to things that we share. And so, when I tell you something—I live in New York half the time—so if you ask me who created the flag, we’d both have the same answer because we live in the same country. But someone on a different side of the world might have a different answer, and it’s exactly the same thing with AI. Until we’re able to bake in contextual awareness, cultural awareness, or even things like, very simply, knowing what is the most common answer that people would give, we are going to have those kind of weird side effects that you just observed here.
So isn’t it, though, the case that all language is inherently ambiguous? I mean once you get out of the realm of what is two plus two, everything like, “Are you happy? What’s the weather like? Is that pretty?” [are] all like, anything you construct with language has inherent ambiguity, just by the nature of words.
And so how do you get around that?
As humans, the way that we get around that is that we actually have a sort of probabilistic model in our heads of how we should interpret something. And sometimes it’s actually funny because you know, I might say something and you’re going to take it wrong, not because I meant it wrong, but because you understood it in different context reference frame. But fortunately, what happens is that people who usually interact together usually share some sort of similar contextual reference points. And based on this it means we’re able to share in a very natural way without having to explain the logic behind everything we say. So, language in itself is very ambiguous. If I tell you something such as, “The football match yesterday was amazing,” this sentence grammatically and syntactically is very simple, but the meaning only makes sense if you and I were watching the same thing yesterday, right? And so, this is exactly why computers vary. It’s still unable to understand human language the same way we do is because it’s unable to understand this notion of context unless you give it to it. And I think this is going to be one of the most active fields of research. Natural language processing is going to be you know, basically, baking in contextual awareness into natural language understanding.
So you just said a minute ago at the beginning of that, that humans have a probabilistic model that they’re running in their head—is that really true though? Because if I ask somebody, I just come up to a stranger how many minutes are in a year, they’re not going to say well there is 82.7% chance he’s referring to a calendar year, but it’s a 17.3% he’s referring to a solar year. I mean they instantly only have one association with that question, most people, right?
Of course.
And so they don’t actually have a probabilistic—are you saying it’s a de-facto one—
Talk to that for just a second.
I mean, how it’s actually encoded in the brain? I don’t know. But the fact is that depending on the way I ask the question, depending on the information I’m giving you about how you should think about the question, you’re going to think about a different answer. So, if I tell you, you know how many stars are—let’s say, “How many minutes are in the year? If I ask you the question like this, this is the most common way of asking the question, which means that you know I’m expecting you to give me the most common answer to the question. But if I give you more information, if I told you, “How many minutes are in a solar year?” So now I’ve specified extra information, then that will change the answer you’re going to give me, because now the probability is no longer that I’m asking for the general question, but rather, I’m asking you for a very specific one. And so you have this sort of like, all these connections built into your brain, and depending on which of those elements are activated, you’re going to be giving me a different response. So, think about it as like, you have this kind of graph of knowledge in your head, and whenever I’m asking something, you’re going to give me a response by picking the most likely answer.
So this is building up to—well, let me ask you one more question about language, and we’ll start to move past this a little bit, but I think this is fascinating. So, the question is often raised, “Are there other intelligent creatures on Earth?” You know the other sorts of animals and what not. And one school of thought says that language is an actual requirement for intelligence. That without language, you can’t actually conceive of abstract ideas in your head, you can’t do any of that, and therefore anything that doesn’t have language doesn’t have intelligence. Do you agree with that?
I guess if you’re talking about general intelligence, yes. Because language is really just a universal interface for, you know, representing things. This is the beauty of language. You and I speak English, and we don’t have to learn a specific language for every topic we want to talk about. What we can do instead is we can use the sync from the mental interface, the language, to express all kinds of different ideas. And so, the flexibility of natural language means that you’re able to think about a lot more different things. And so this, inherently, I believe, means that it opens up the amount of things you can figure out—and hence, intelligence. I mean it makes a lot of sense. To be honest, I’ve never thought about it exactly like this, but when you think about it, if you have a very limited interface to express things, you’re never going to be able to think about that many things.
So Alan Turing famously made the Turing Test, which he said that if you are on a terminal, you’re in a conversation with something in another room and you can’t tell if its person or a machine—interestingly he said 30% of the time a machine can fool you—then we have to say the machine is thinking.Do you interpret that as language “indicates that it is thinking,” or language is “it is actually thinking”?
I was talking about this recently actually. Just because a machine can generate an answer that looks human, doesn’t mean that the machine actually understands the answer given. I think you know the depth of understanding of the semantics, and the context goes beyond the ability to generate something that makes sense to a human. So, it really depends on what you’re asking the machine. If you’re asking something trivial, such as, you know, how many days are in a year, or whatever, then of course, I’m sure the machine can generate a very simple, well-structured answer that would be exactly like a human would. But if you start digging in further, if you start having a conversation, if you start essentially, you know, brainstorming with the machines, if you start asking for analysis of something, then this is where it’s going to start failing, because the answers it’s going to give you won’t have context, it won’t have abstraction, it won’t have all of these other things which makes us really human. And so I think, you know, it’s very, very hard to determine where you should draw the line. Is it about the ability to write letters in a way that is syntactically, grammatically correct? Or is it the ability to actually have an intelligent conversation, like a human would? I think the former, we can definitely do in the near future. The latter will require AGI, and I don’t think we’re there yet.
So you used the word “understanding,” and that of course immediately calls up the Chinese Room Problem, put forth by John Searle. For the benefit of the listener, it goes like this: There’s a man who’s in a room, and it’s full of these many thousands of these very special books. The man doesn’t speak any Chinese, that’s the important thing to know. People slide questions in Chinese underneath the door, he picks them out, and he has this kind of algorithm. He looks at the first symbol; he finds a matching symbol on the spine of one of the books. He looks up the second book, that takes him to a third book, a fourth book, a fifth book, all the way up. So he gets to a book that he knows to copy some certain symbols from and he doesn’t know what they mean, he slides it back under the door, and the punch line is, it’s a perfect answer, in Chinese. You know it’s profound, and witty, and well-written and all of that. So, the question that Searle posed and answered in the negative is, does the man understand Chinese? And of course, the analogy is that that’s all a computer can do, and therefore a computer just runs this deterministic program, and it can never, therefore, understand anything. It doesn’t understand anything. Do you think computers can understand things? Well let’s just take the Chinese Room, does the man understand Chinese?
No, he doesn’t. I think actually this is a very, very good example. I think it’s a very good way to put it actually. Because what the person has done in that case, to give a response in Chinese, he literally learns an algorithm on the fly to give him an answer. This is exactly how machine learning currently works. Machine learning isn’t about understanding what’s going on; it’s about replicating what other people have done, which is a fundamental difference. It’s subtle, but it’s fundamental because to be able to understand you need to be able to also replicate de-facto, right? Because if you can understand, you replicate. But being able to replicate, doesn’t mean that you’re able to understand. And the way that we build those machine learning models today are not meant to have a deep understanding of what’s going on. It’s meant to have a very appropriate, human, understandable response. I think this is exactly what happens in this thought experiment. It’s exactly the same thing pretty much.
Without going into general intelligence, I think what we really have to think about today, the way I’d like to see this is, machine learning is not about building human-like intelligence yet. It’s about replacing the need to program a computer to perform a task. Up until now, when you wanted to make a computer do something, what you had to do first is understand what the phenomenon is yourself. So, you had to become an expert in whatever you were trying to automate, and then you would write a computer code with those rules. And so the problem is that doing this would take you a while, because a human would have to understand what’s going on, which can take a while. And also your problem, of course, is not everything is understandable by humans, at least not easily. Machine learning completely replaces the need to become an expert. So instead of understanding what’s going on and then programming the machine, you’re just collecting examples of what’s going on, and feeding it to the machine, who will then figure out a way to reproduce that. So, you know the simple example is, show me a pattern of numbers with written five times five, and ask me what is a pattern, I’ll learn that it’s five, if that makes sense. So this is really about this—this is really about getting rid of the need to understand what you’re trying to make the machine do and just give it examples that it can just figure out by itself.
So we began with my wind-up car, then the cat food dish, and we’re working up to understanding…eventually we have to get to consciousness because consciousness is this thing, people say we don’t know what it is. But we know exactly what it is, we just don’t know how it comes about. So, what it is, is that we experience the world. We can taste the pineapple or see the redness of the sunset in a way that’s different than just sensing the world…we experience. Two questions: do you have any personal theory on where consciousness comes from, and second, is consciousness key to understanding, and therefore key to an AGI?
I think so. I think there is no question that consciousness is linked to general intelligence because general intelligence means that you need to able to create an abstraction of the world, which means that you need to be able to go beyond observing it, but also be able to understand it and to experience it. So, I think that is a very simple way to put it. What I’m actually wondering is whether consciousness was a consequence of biology and whether we need to replicate that in a machine, to make it intelligent like a human being is intelligent. So essentially, the way I’m thinking about this is, is there a way to build a human intelligence that would seem human? And do we want that to seem human? Because if it’s just about reproducing the way intelligence works in a machine, then we shouldn’t care if it feels human or not, we should just care about the ability for the machine to do something smart. So, I think the question of consciousness in a machine is really down to the question of whether or not we want to make it human. There are many technologies that we’ve built for which we have examples in nature, which perform the same task, but don’t work the same. Birds and planes, for example, I’m pretty sure a bird needs to have some sort of like, consciousness of itself of not getting into the wall, whereas we didn’t need to replicate all those tiny bits for the actual plane to fly. It’s just a very different way of doing things.
So do you have a theory as to how it is that we’re conscious?
Well, I think it probably comes from the fact that we had to evolve as a species with other individuals, right? How would you actually understand where to position yourself in society, and therefore, how to best build a very coherent, stable, strong community, if you don’t have consciousness of other people, of nature, of yourself? So, I think there is like, inherently, the fact that having a kind of ecosystem of human beings, and humans in nature, and humans and animals meant that you had to develop consciousness. I think it was probably part of a very positive evolutionary strategy. Whether or not that comes from your neurons or whether that comes more from a combination of different things, including your senses, I’m not sure. But I feel that the need for consciousness definitely came from the need for integrating yourself into broader structure.
And so not to put words in your mouth, but it sounds like you think, you said “we’re not close to it,” but it is possible to build an AGI, and it sounds like you think it’s possible to build, hypothetically, a conscious computer and you’re asking the question of would we want to?
Yes. The question is whether or not it would make sense for whatever we have in mind for it. I think probably we should do it. We should try to do it just for the science, I’m just not sure this is going to be the most useful thing to do, or whether we’re going to figure out an even more general general-intelligence which doesn’t have only human traits but has something even more than this, that would be a lot more powerful.
Hmmm, what would that look like?
Well, that is a good question. I have clearly no idea because otherwise—it is very hard to think about a bigger intelligence and the intelligence that we are limited to, in a sense. But it’s very possible that we might end up concluding that well you know, human intelligence is great for being a human, but maybe a machine doesn’t have to have the same constraints. Maybe a machine can have like a different type of intelligence, which would make it a lot better suited for the type of things we’re expecting the machine to do. And I don’t think we’re expecting the machines to be human. I think we’re expecting the machines to augment us, to help us, to solve problems humans cannot solve. So why limit it to a human intelligence?
So, the people I talk to say, “When will we get an AGI?” The predictions vary by two orders of magnitude—you can read everything from 5 to 500 years. Where do you come down on that? You’ve made several comments that you don’t think we’re close to it. When do you think we’ll see an AGI? Will you live to see an AGI, for instance?
This is very, very hard to tell, you know I mean there is this funny artifact that everybody makes a prediction 20 years in the future, and it’s actually because most people when they make those predictions, have about 20 years left in their careers. So, you know, nobody is able to think beyond their own lifetime, in a sense. I don’t think it’s 20 years away, at least not in the sense of real human intelligence. Are we going to be able replicate parts of AGI, such as, you know, the ability to transfer learning from one task to another? Yes, and I think this is short-term. Are we going to be able to build machines that can go one level of abstraction higher to do something? Yes, probably. But it doesn’t mean they’re going to be as versatile, as generalist, as horizontally thinking as we are as humans. I think for that, we really, really have to figure out once and for all whether a human intelligence requires a human experience of the world, which means the same senses, the same rules, the same constraints, the same energy, the same speed of thinking, or not. So, we might just bypass, as I said—human intelligence might go from like narrow AI, to a different type of intelligence, that is neither human or narrow. It’s just different.
So you mentioned transferred learning. I could show you a small statue of a falcon, and then I could show you a hundred photographs, and some of them have the falcon under water, on its side, in different light, upside down, and all these other things. Humans have no problem saying, “there it is, there it is, there it is,” you know just kind of find Waldo [but] with the falcon. So, in other words, humans can train with a sample size of one, primarily because we have a lot of experience seeing other things in lowlight and all of that. So, if that’s transferred learning it sounds like you think that we’re going to be able to do that pretty quickly, and that’s kind of big deal if we can really teach machines to generalize the way we do. Or is that kind of generalization that I just went through, that actually is part of our general intelligence at work?
I think transferred learning is necessary to build AGI, but it’s not enough, because at the end of the day, just because a machine can learn to play a game and then you know have a starting point to play another game, doesn’t mean that it will make the choice to learn this other game. It will still be you telling it, “Okay, here is a task I need you to do, use your existing learning to perform it.” It’s still pretty much task-driven, and this is a fundamental difference. It is extremely impressive and to be honest I think it’s absolutely necessary because right now when you look at what you do with machine learning, you need to collect a bunch of different examples, and you’re feeding that to the machine, and the machine is learning from those examples to reproduce that behavior, right? When you do transferred learning, you’re still teaching a lot of things to the machine, but you’re teaching it to reuse other things so that it doesn’t need as much data. So, I think inherently the biggest benefit of transferred learning will be that we won’t need to collect as much data to make the computers do something new. It solves, essentially, the biggest friction point we have today, which is how do you access enough data to make the machine learn the behavior? In some cases, the data does not exist. And so I think transferred learning is a very elegant and very good solution to that problem.
So last question I want to ask you about AGI and then we can turn the clock back and talk to issues closer at hand is as follows: It sounds like you’re saying an AGI is more than 20 years off, if I just inferred that from what you just said. And I am curious because the human genome is 2 billion base pairs, it’s something like 700 MB of information, most of which we share with plants, bananas, and what-not. And if you look at our intelligence versus a chimp, or something, we only have a fraction of 1% of the DNA that is different. What that seems to suggest to me at least is that if the genome is 700 MB, and the 1% difference gives us an AGI, then the code to create an AGI could be a small as 7 MB.
Pedro Domingos wrote a book called The Master Algorithm, where he says that there probably is an algorithm, that can solve a whole world of problems, and get us really close to AGI. Then other people on another end of the spectrum, like Marvin Minsky or somebody, don’t even know that we have an AGI, that we’re like just 200 different hacks—kind of 200 narrow intelligences that just kind of pull off this trick of seeming like a general intelligence. I’m wondering if you think that an AGI could be relatively simple—that it’s not a matter of more data or more processing, but just a better algorithm?
So just to be clear, I don’t consider a machine who can perform 200 different tasks to be an AGI. It’s just like an ensemble of, you know, narrow AIs.
Right, and that school of thought says that therefore we are not an AGI. We only have this really limited set of things we can do that we like to pass off as “ah, we can do anything,” but we really can’t. We’re 200 narrow AIs, and the minute you ask us to do things outside of that, they’re off our radar entirely.
For me, the simplest definition of how to differentiate between a narrow AI and an AGI is, an AGI is capable of kind of zooming out of what it knows—so to have basically like a second-degree view of the facts that it learned, and then reuse that to do something completely different. And I think this capacity we have as humans. We did not have to learn every possible permutation; we did not have to learn every single zooming out of every fact in the world, to be able to do new things. So, I think I definitely agree that as a human, we are AGI. I just don’t think that having a computer who can learn to do two hundred different things would do that. You would still need to figure out this ability to zoom out, this ability to create abstraction of what you’ve been learning and to reapply it somewhere else. I think this is really the definition of horizontal thinking, right? You can only think horizontally if you’re looking up, rather than staying in a silo. So, to your question, yea. I mean, why not? Maybe the algorithm for AGI is simple. I mean think about it. Deep learning, machine learning in general, these are deceptively easy in terms of mathematics. We don’t really understand how it works yet, but the mathematics behind it is very, very, easy. So, we did not have to come up with this like crazy solution. We just came up with an algorithm that turned out to be simple, and that worked really well when given a ton of information. So, I’m pretty sure that AGI doesn’t have to be that much more complicated, right? It might be one of those E = mc2sort of plugins I think that we’re going to figure out.
That was certainly the hope, way back, because physics itself obeys such simple laws that were hidden from us, and then once elucidated seemed, any 11th gradehigh-school student could learn, maybe so. So, pulling back more toward the here and now—in ’97, Deep Blue beat Kasparov, then after that we had Ken Jennings lose in Jeopardy, then you had AlphaGo beat Lee Sedol, then you had some top-ranked poker players beaten, and then you just had another AlphaGo victory. So, AI does really well at games presumably because they have a very defined, narrow rule set, and a constrained environment. What do you think is going to be, kind of, the next thing like that? It hits the papers and everybody’s like, “Wow, that’s a big milestone! That’s really cool. Didn’t see that coming so soon!” What do you think will be the next sort of things we’ll see?
So, games are always a good example because everybody knows the game, so everybody is like, “Oh wow, this is crazy.” So, putting aside I guess the sort of PR and buzz factor, I think we’re going to solve things like medical diagnosis. We’re going to solve things like understanding voice very, very soon. Like, I think we’re going to get to a point very soon, for example, where somebody is going to be calling you on the phone and it’s going to be very hard for you to distinguish whether it’s a human or a computer talking. Like I think this is definitely short-term as in less than 10years in the future, which poses a lot of very interesting questions, you know, around authentication, privacy, and so forth. But I think the whole realm of natural language is something that people always look at as a failure of AI—“Oh it’s a cute robot, it barely actually knows how to speak, it has a really funny sounding voice.” This is typically the kind of thing that nobody thinks, right now, a computer can do eloquently, but I’m pretty sure we’re going to get there fairly soon.
But to our point earlier, the computer understanding the words, “Who designed the American flag?” is different than the computer understanding the nuance of the question. It sounds like you’re saying we’re going to do the first, and not the second very quickly.
Yes, correct. I think like somewhere the computer will need to have a knowledge base of how to answer, and I’m sure that we’re going to figure out which answer is the most common. So, you’re going to have this sort of like graph of knowledge that is going to be baked into those assistants that people are going to be interacting with. I think from a human perspective, what is going to be very different, is that your experience of interacting with a machine will become a lot more seamless, just like a human. Nobody today believes that when someone calls them on the phone, it’s a computer. I think this is like a fundamental thing that nobody is seeing coming really but is going to shift very soon. I can feel there is something happening around voice which is making it very, very, very…which is going to make it very ubiquitous in the near future, and therefore indistinguishable from a human perspective.
I’m already getting those calls frankly. I get these calls, and I go “Hello,” and it’s like, “Hey, this is Susan, can you hear me okay?” and I’m supposed to say, “Yes, Susan.” Then Susan says, “Oh good, by the way, I just wanted to follow up on that letter I sent you,” and we have those now. But that’s not really a watershed event. That’s not, you wake up one day and the world’s changed the way it has when they say, there was this game that we thought computers wouldn’t be able to do for so long, and they just did it, and it definitively happened. It sounds like the way you’re phrasing it—that we’re going to master voice in that way—it sounds like you say we’re going to have a machine that passes the Turing Test.
I think we’re going to have a machine that will pass the Turing Test, for simple tasks. Not for having a conversation like we’re having right now. But a machine that passes the Turing Test in, let’s say, a limited domain? I’m pretty sure we’re going to get there fairly soon.
Well anybody who has listened to other episodes of this, knows my favorite question for those systems that, so far, I’ve never found one that could answer, and so my first question is always “What’s bigger a nickel or the sun?” and they can’t even right now do that. The sun could be s-u-nor s-o-n, a nickel is a metal as well as a unit of currency, and so forth. So, it feels like we’re a long way away, to me.
But this is exactly what we’ve been talking about earlier; this is because currently those assistants are lacking context. So, there’s two parts of it, right? There’s the part which is about understanding and speaking, so understanding a human talking and speaking in a way that a human wouldn’t realize it’s a computer speaking, this is more like the voice side. And then there is the understanding side. Now you add some words, and you want to be able to give a response that is appropriate. And right now that response is based on a syntactic and grammatical analysis of the sentence and is lacking context. But if you plug it into a database of knowledge, that it can tap into—just like a human does by the way—then the answers it can provide you will be more and more intelligent. It will still not be able to think, but it will be able to give you the correct answers because it will have the same contextual references you do.
It’s interesting because, at the beginning of the call, I noted about the Turing Test that Turing only puta 30% benchmark. He said if the machine gets picked 30% of the time, we have to say its thinking. And I think he said 30% because the question isn’t, “Can it think as well as a human,” but “Can it think?” The really interesting milestone in my mind is when it hits 51%, 52%, of the time, and that would imply that it’s better at being human than we are, or at least it’s better at seeming human than we are.
Yes, so again it really depends on how you’re designing the test. I think a computer would fail 100% of the time if you’re trying to brainstorm with it, but it might win 100% of the time if you’re asking it to give you an answer to a question.
So there’s a lot of fear wrapped up in artificial intelligence and it’s in two buckets. One is the Hollywood fear of “killer robots,” and all of that, but the much more here and now, the one that dominates the debate and discussion is the effect that artificial intelligence, and therefore automation, will have on jobs. And this you know there are three broad schools of thought, one is that there is a certain group of people that are going to be unable to compete with these machines and will be permanently unemployed, lacking skills to add economic value. The second theory says that’s actually that’s what’s going to happen to all of us, that there is nothing in theory a machine can’t do, that a human can do. And then a final school of thought that says we have 250 years of empirical data of people using transformative technologies, like electricity, just to augment their own productivity and increase their productivity, and therefore their standard of living. You’ve said a couple of times, you’ve alluded to machines working with humans—AIs working with humans—but I want to give you a blank slate to answer that question. Which of those three schools of thought are you most closely aligned to and why?
I’m 100% convinced that we have to be thinking human plus machines, and there are many reasons for this. So just for the record, it turns out I actually know quite a bit about that topic because I was asked by the French government, a few months ago, to work on their AI strategy for employment. The country, the government wanted to know, “What should we do? Is this going to be disruptive?” So, the answer, the short answer is, every country will be impacted in a different way because countries don’t have the same relationship to automation based on how people work, and what they are doing essentially. For France in particular, which is what I can talk about here, what we ended up realizing is that machines…the first thing which is important to keep in mind is we’re talking the next ten years. So, the government does not care about AGI. Like, we’ll never get to AGI if we can’t fix the short-term issues that, you know, narrow intelligence is already bringing on the table. The point is, if you destroy society because of narrow AI, you’re never going to get to AGI anyway, so why think about it? So, we really focused on thinking on the next 10years and what we should do with narrow AI. The first thing we realized that is narrow intelligence, narrow AI, is much better than humans at performing whatever it has learned to do, but humans are much more resilient to edge cases and to things which are not very obvious because we are able to do horizontal thinking. So, the best combination you can have in any system will always be human plus machine. Human plus machine is strictly better in every single scenario, to human-alone or machine-alone. So if you wanted to really pick an order, I would say human plus machine is the best solution that you can get, then human and machine are just not going to be good at the same things. They’re going to be different things. There’s no one is better than the other, it’s just different. And so we designed a framework to figure out which jobs are going to be completely replaced by machines, which ones are going to be complimentary between human and AI, and which ones will be pure human. And so those criteria that we have in the framework are very simple.
The first one is, do we actually have the technology or the data to build such an AI? Sometimes you might want to automate something, the data does not exist, the censors to collect data does not exist, there are many examples of that. The second thing is, does that task that you want to automate require a very complicated manual intervention? It turns out that robotics is not following the same experimental trends as AI, and so if your job is mostly consisting of using your hands to do very complicated things, it’s very hard to build an intelligence that can replicate that. The third thing is, very simply, whether or not we require general intelligence to solve a specific task? Are you more of a system designer thinking about the global picture of something, or are you very, very focused narrow task worker? So, the more horizontal your job is, obviously, the safer it is. Because until we get AGI, computers will never be able to end this horizontal thinking.
The last two are quite interesting too. The first one is, do we actually want—is it socially acceptable to automate a task? Just because you can automate something, doesn’t mean that this is what we will want to do. You know, for instance, you could get a computer to diagnose that you have cancer, and just email you the news, but do we want that? Or don’t we prefer that at least a human gives us that news? The second good example about it, which is quite funny, is the soccer referee. Soccer in Europe is very big, not as much in the U.S., but in Europe it’s very big, and we already have technology today that could just look at the video screen and do real-time refereeing. It would apply the rules of the game, it would say “Here’s a foul, here’s whatever,” but the problem is that people don’t want that, because it turns out that a human referee makes a judgment on the fly based on other factors that he understands because he’s human such as, “Is it a good time to let people play? Because if I stop it here, it will just make the game boring.” So, it turns out that if we automated the referee of a soccer match, the game would be extremely boring, and nobody would watch it. So nobody wants that to be automated. And then finally, the final criteria is the importance of emotional intelligence in your job. If you’re a manager, your job is to connect emotionally with your team and make sure everything is going well. And so I think a very simple way to think about it is, if your job is mostly soft skills, a machine will not be able to do it in your place. If your job is mostly hard skill, there is a chance that we can automate that.
So, when you take those five criteria, right, and you look at distribution of jobs in France, what you realize is that only about 10% of those jobs will be completely automated, another 30%, 40% won’t change, because it will still be mostly done by human, and about 50% of those jobs will be transformed. The 10% of jobs the machines will take, you’ve got 40% of jobs that humans will take, and you’ve got 50% of jobs, which will change because it will become a combination of humans and machines doing the job. And so the conclusion is that, if you’re trying to anticipate the impact of AI on the French job market and economy, we shouldn’t be thinking about how to solve mass unemployment with half the population not working; rather, we should figure out how to help those 50% of people transition to this AI+human way of working. And so it’s all about continuous education. It’s all about breaking this idea that you like learn one thing for the rest of your life. It’s about getting into a much more fluid, flexible sort of work life where humans focus on what they are good at and working alongside the machines, who are doing things that machines are good at. So, the recommendation we gave to the government is, figure out the best way to make humans and machines collaborate, and educate people to work with machines.
There’s a couple of pieces of legislation that we’ve read about in Europe that I would love to get your thoughts on, or proposed legislation, to be clear. One of them is treating robots or certain agents of automation as legal persons so that they can be taxed at a similar rate as you would tax a worker. I guess the idea being that, why should humans be the only ones paying taxes? Why shouldn’t the automation, the robots, or the artificial intelligences, pay taxes as well? Practically, what do you think? Two, what do you think should be the case? What will happen and what should happen?
So, for taxing robots, I think that it’s a stupid idea for a very simple reason, is that how do you define what a machine is, right? It’s easy when you’re talking about an assembly line with a physical machine because you can touch it. But how many machines are in an image recognition app? How do you define that? And so what the conclusion is, if you’re trying to tax machines, like you would tax humans for labor, then you’re going to end up not being able to actually define what is a machine. Therefore, you’re not going to actually tax the machine, but you’re going to have to figure out more of a meta way of taxing the impact of machines—which basically means that you’re going to increase the corporate taxes, like the profit tax, that companies are making as a kind of catch-all for what you’re doing. So, if you’re doing this, you’re impeding your investment and innovation, and you’re actually removing the incentive to do that. So I think that it makes no sense whatsoever to try to tax robots because the net consequence is that you’re just going to increase the taxes that companies have to pay overall.
And then the second one is the idea that, more and more algorithms, more and more AIs help us make choices. Sometimes they make choices for us—what will I see, what will I read, what will I do? There seems to be a movement to legislatively require total transparency so that you can say “Why did it recommend this?” and a person would need to explain why the AI made this recommendation. One, is that a good idea, and two, is it even possible at some level?
Well this [was] actually voted [upon] last year and it comes into effect next year as part of a bigger privacy regulation called GDPR, that applies to any company that wants to do business with a European citizen. So, whether you’re American, Chinese, French, it doesn’t matter, you’re going to have to do that. And in effect, one of the things that this regulation poses, is that any automated treatment that results in a significant impact on your life—a medical diagnosis, an insurance pricing whatever, like an employment or like a promotion you get—you have to be able to explain how the algorithm made that choice. By the way, this law [has] existed in France already since 1978, so it’s new in Europe, but it has been existing in France for 40 years already. The reason why they put this is very simple, is because they want to avoid people being excluded because a machine learned a bias in the population, and that person essentially not being able to go to court and say, “There’s a bias, I was unfairly treated.”
So essentially the reason why they want transparency, is because they want to have accountability against potential biases that might be introduced, which I think makes a lot of sense, to be honest. And that poses a lot of questions, of course, of what do you consider an algorithm that has an impact on your life? Is your Facebook newsfeed impacting your life? You could argue it does, because the choice of news that you see will change your influence, and Facebook knows that. They’ve experimented with that. Does a search result in Google have an impact on your life? Yes it does, because it limits the scope of what you’re seeing. My feeling is that, when you keep pushing this, what you’re going to end up realizing is that a lot of the systems that exist today will not be able to rely on this black-box machine learning model, but rather would have to use other types of methods. And so one field of study, which is very exciting, is actually making deep learning understandable, for precisely that reason.
Which it sounds like you’re in favor of, but you also think that that will be an increasing trend, over time.
Yeah, I mean I believe that actually what’s happening in Europe is going to permeate to a lot of the other places in the world. The right to privacy, the right to be forgotten, the right to have transparent algorithms when they’re important, the right to transferability of your personal data, that’s another very important one. This same regulation means that all my data I have with a provider, I can tell that provider, to send it to another provider, in a way that the other provider can use it. Just like when you change carriers, you can switch phone number without worrying about how this works, this will now apply to every single piece of personal data companies have around you when you’re a European citizen.
So, this is huge, right? Because think about it, what this means is if you have a very key algorithm for making a decision, you now have to publish and make that algorithm transparent. What that means is that someone else could replicate this algorithm in the exact same way you’re doing it. This, plus the transferability of personal data means that you could have two exactly equivalent services which have the same data about you, that you could use. So that completely breaks any technological monopoly[on] important things for your life. And so I think this is very, very interesting because the impact that this will have on AI is huge. People are racing to get the best AI algorithm and the best data. But at the end of the day—if I can copy your algorithm because it’s an important thing for my life, and it has to be transparent, and if I can transfer my data from you to another provider—you don’t have as much of a competitive advantage anymore.
But doesn’t that mean, therefore, you don’t have any incentive to invest in it? If you’re basically legislating all sorts…[if] all code is open-sourced, then why would anybody spend any money investing in something that they get no benefit whatsoever from?
Innovation. User experience. Like monopoly is the worst thing that could happen for innovation and for people, right?
Is that truly necessarily? I mean patents are a form of monopoly, right? We let drug companies have a monopoly on some drug for some period of time because they need some economic incentive to invest in it. All of law is built around monopoly, in one form or the other, based on the idea of patents. If you’re saying there’s an entire area that’s worth trillions of dollars, but we’re not going to let anybody profit off of it—because anything you do you have to share with everybody else—aren’t you just destroying innovation?
That transparency doesn’t prevent you from protecting your IP, right?
What’s the difference between the IP and the algorithm?
So, you can still patent the system you created, and by the way, when you patent a system, you make it transparent as well because anybody can read the patent. So, if anything I don’t that changes the protection over time. I think what that fundamentally changes is that you’re no longer going to be limited to a black-box approach that you’re not going to be able to have visibility on. I think the Europeans want the market to become a lot more open, they want people to have choices, and they want people to be able to say no to a company that they don’t share the values of the company, and they don’t like the way they’re being treated.
So obviously privacy is something near and dear to your heart. Snips is an AI assistant designed to protect privacy. Can you tell us what you’re trying to do there, and how far along you are?
So when we started the company in 2013, we did it as a research lab in AI, and one of the first things we focused on was this intersection between AI and privacy. How do you guarantee privacy in the way that you’re building those AIs? And so that eventually led us to what we’re currently doing now, which is we’re selling a voice platform for connected devices. So, if you’re building a car and you want people to talk to it, you can use our technology to do that, but we’re doing it in a way that all the data of the user, its voice, its personal data never leaves the device that the user has interacted with. So, you know whereas Alexa and Siri and Google Assistant are running in the cloud, we’re actually running completely on the device itself. There is not a single piece of your personal data that goes to a server. And this is important because voice is biometric, voice is something that identifies you uniquely that you cannot change, it’s not like a cookie in a browser, it’s more like a fingerprint. When you send biometric data to the cloud, you’re exposing yourself to having your voice copied, potentially, down the line, and you’re increasing your risk that someone might break into one of those servers and essentially pretend to be a million people on the phone, with their banks, their kids, whatever. So, I think for us, like, privacy is extremely important as a part of the game, and by the way, doing things on device means that we can guarantee privacy by design, which also means that we are currently the only technology on the planet that is 100% compliant with those new European regulations. Everybody else is in a gray area right now.
And so where are you in your lifecycle of your product?
We’ve been actually building this for quite some time; we had quite a bunch of clients use it. We officially launched it a few weeks ago, and the launch was really amazing. We even have a web version that people can use to build prototypes for Raspberry Pi. So, our technology, by the way, can run completely on a Raspberry Pi. So we do everything from speech recognition to natural language understanding on that actual Raspberry Pi, and we’ve had over a thousand people start building assistants on it. I mean it was really, really crazy. So, it’s a very, very mature technology, we benchmarked it against Alexa, against Google Assistant, against every other technology provider out there for voice, and we’ve actually gotten better performances than they did. So we have a technology that can run on a Raspberry Pi, or any other small device, that guarantees privacy by design, that is compliant with the new European regulation, and that performs better than everything that’s out there. This is important, because, you know there is this false dichotomy that you have to trade off AI and privacy, but this is wrong, this is actually not true at all. You can really have the two together.
Final question, do you watch or read, or consume any science fiction, and if so, do you know any views of the future that you think are kind of in alignment with yours or anything you look at and say “Yes, that’s what could happen!”
I think there are bits and pieces in many science fiction books, and actually this is the reason why I’m thinking about writing one myself now.
All right, well Rand this has been fantastic. If people want to keep up with you, and follow all of the things you’re doing and will do, can you throw out some URLs, some Twitter handles, whatever it is people can use to keep an eye on you?
Well, the best way to follow me I guess would be on Twitter, so my handle is RandHindi, and on Medium, my handle is RandHindi. So, I blog quite a bit about AI and privacy, and I’m going to be announcing quite a few things and giving quite a few ideas in the next few months.
All right, well this has been a far-reaching and fantastic hour. I want to thank you so much for taking the time, Rand.
Thank you very much. It was a pleasure.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.

Voices in AI – Episode 38: A Conversation with Carolina Galleguillos

In this episode Byron and Carolina discuss computer vision, machine learning, biology and more.
[podcast_player name=”Episode 38: A Conversation with Carolina Galleguillos” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-03-29-(00-59-32)-carolina-galleguillos.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/03/voices-headshot-card-3.jpg”]
Byron Reese: This is Voices in AI brought to you by Gigaom, I’m Byron Reese. Today our guest is Carolina Galleguillos. She’s an expert in machine learning and computer vision. She did her undergrad work in Chile and has a master’s and PhD in Computer Science from UC San Diego. She’s presently a machine learning engineer at Thumbtack. Welcome to the show.
Carolina Galleguillos: Thank you. Thank you for having me.
So, let’s start at the very beginning with definitions. What exactly is “artificial” about artificial intelligence?
Well, I read somewhere that artificial intelligence is basically trying to make machines think, which is very “sci-fi,” I think, but what I’m trying to say here is we’re trying to automate a lot of different tasks that humans do. We have done that before in the Industrial Revolution, but now we’re trying to do it with computers and with interfaces that look more human-like. We also have robots that also have computers inside. I think that’s more of the artificial part. The intelligence, we’ll see how intelligent these machines will become in time.
Alan Turing asked the question, “Can a machine think?” Do you think a machine can think, or will a machine be able to think?
I think we’re really far from that. The brain is a really, really complex thing. I think that we can approximate the thinking of a machine to be able to follow certain rules, or learn patterns that seem more like common sense, but at the end of the day, it won’t think autonomously, I think. We’re really far from that.
I want to get into computer vision here in just a minute.
But I’m really fascinated by this, because that’s a pretty low bar. If you say it’s using machines to do things people do, then a calculator is an artificial intelligence in that view. Would you agree with that?
Well, not really, because a calculator is just executing commands.
But isn’t that what a computer program does?
Yeah, it does. But I would say that in machine learning, you don’t need to program those rules. The program will infer the rules by seeing data. So you’re not explicitly writing down the rules in that program and that’s what makes it different from a calculator.
Humans do something really kind of cool. You show a human an object, like a little statue of some figure, and then you show them a hundred pictures, and they can tell what that figure is—even if it’s upside down, if it’s underwater, if the lighting changes, if they only see half of it. We’re really far away from being able to do that with a machine, correct?
Well, it depends, always I think it is depends. We can do very well now in certain conditions, but we are far from—I’m not saying super far—doing it when you don’t have all the information, I would say.
How do humans do that? Is it that we’re really good at transfer learning or what do you think we’re doing?
Well, yes, transfer learning, but also a lot about the context. I think that the brain is able to store so many different connections—millions and millions of connections, it has so much experience—and that information goes into recognizing objects. It’s very implicit. A person cannot recognize something they’ve never seen before, but if that person has the context about what it should be, it would be able to find it. So I think that’s the main point.
If I took you into a museum gallery, and there was a giant wall with two hundred paintings on it—they’re all well-known paintings, they’re all realistic and all of that—and I hang one of them upside down, a human notices that pretty quickly. But a computer doesn’t. A computer uses the same kind of laborious algorithm to try to figure out which painting is upside down, but a human just spots it right away. What do you think is going on there?
I think that what’s going on is probably the fact that we have context about what we usually face. We usually see paintings that are straight, that they point up, so we are really quick to identify when things are not the way we expect them to be. A computer doesn’t have that knowledge, so they start from a clear slate.
What would giving them that kind of context look like? I mean, if I just said, “Here’s 100,000 paintings and they’re all right side up. Now, quick, glance at that wall and tell me which one’s upside down,” the computer wouldn’t necessarily be able to do it right away, would it? What kind of context do we have that they don’t—what paintings look like right side up, or what reality looks like right side up? What do you think?
Well, if there are objects on that painting, the computer will probably also be able to say that it’s upside down. Now, if it’s a very modern piece, I don’t think a human could also figure out if it’s upside down or not. I think that’s the key of the problem. If it’s basically a bunch of colors, I wouldn’t be able to say that’s upside down. But if it is the painting of a lady, the face of a woman, I would be very quick to spot that that painting is upside down. And I think a computer also could do that, because you can train a computer to identify faces. When that face is upside down, it would be able to say that, too.
It’s interesting because if you were an artist and you drew fantastic landscapes in science fiction worlds, and you showed people different ones; somebody could point at something and say that’s not very realistic, but that one is. But in reality, of course, they’re alien planets. But it’s because we have a really deep knowledge about things, like, the shapes of biological forms, and the effects of gravity—just this really intuitive level of what “looks right” and what doesn’t. What are the steps you go through to get a computer to have just that kind of natural understanding of reality?
That’s a good question.  I think, as part of recognizing objects—let’s say that’s our main task—we try to also give more information about how these objects are presented in reality.  So, you can have algorithms that can code the spatial information of objects—usually you’re going to find the sky above, the grass is down below, and usually you won’t find a car on top of a building, and all that. So you can actually train an algorithm that can surface those patterns, and then, when you show them something that is different, it’s going to make those assumptions, and one of the outcomes is that it might not recognize objects correctly because those objects are not in the context that the algorithm was trained on.
And do you think that’s what humans are doing, that we just have this knowledge base? Because I could totally imagine somebody looking at these alien landscape paintings and saying, “That one doesn’t look right,” and then they say, “Well, why doesn’t it look right?” and it’s like, “I don’t know. It just doesn’t look right.” Is it that there’s some much deeper level that humans are able to understand things, that wouldn’t necessarily be able to be explicitly programmed, or is that not the case?
I think that there’s a belief in machine learning, and now especially with deep learning, that if you have enough data, say millions and millions of examples, those patterns will surface. You don’t have to explicitly put them there. And then those images—let’s say we’re doing computer vision—will encode those rules, like, you’re always going to see the size of a car and the size of a person are mostly around the same, even though you see them at different distances.
I know that as humans, because we have so much experience and information, we can make those claims when we see something that seems odd. At the same time, we can have algorithms that—if you have enough data to get those patterns surfaced—could also be able to spot that. I think that it’s happening more and more in areas like medicine, when you want to find cancer. So they’re trying to leverage those algorithms to be able to detect those anomalies.
How much do you think about biology and how humans recognize things when you’re considering how to build a system that recognizes things? Are they close analogs or is it just trivia that we both happen to recognize things but we’re going to do it in such radically different ways? There’s not really much we can learn from the brain?
This is a very hot topic, I’d say, in the community. There’s definitely a lot of machine learning that is inspired by the brain, or by biology. And so, they’re trying to build architectures that simulate the way that the brain works, or how the eyes would process information. I think that they do that in order to understand how the brain works, in order to do the other way around, which is create algorithms that emulate the brain, because I think that would be extremely hard to do.
When I build machine learning systems, either computer vision or just generic machine learning systems, I usually am not inspired by biology, because I’m usually trying to focus on very specific tasks. And if I were to be inspired by the brain, I would have to take into account a lot of different things into my algorithm, which sometimes just wants to do something still very smart, but very focused, and the brain actually tries to take into account a lot of different inputs. So that’s how I usually approach the work I do.
Humans have a lot of cognitive biases. So we have ways that our brain doesn’t work. It appears to have these bugs in it. For instance, we over-see patterns, right? I guess we over fit. You can look up at a cloud and you see a dog. And the thesis goes that a long time ago it was far better to mistake a rock for a bear and run away than to mistake the bear for the rock and get eaten. 
Do you think that when we build computer systems that can recognize objects, are they going to have our cognitive biases because we’re coding them? Or, are they going to have their own ones that we can’t really predict? Or, will they be kind of free of bias because they’re just trained off the data?
I think it depends. Basically, I think it depends on how you are going to build that system. Like, if you do it by being inspired by the brain; you might actually be able to put your own bias against it, because you might say, well, this is a rock and this is a bear and bears and rocks show up together in certain occasions, and you might actually be able to put your own bias in it. Now, if you let the data sort of speak by itself, by showing examples through algorithms, then the machine, or the computer, will just make their own judgment about that, without any bias.  You can always bias the data as well, that’s a different problem, but let’s say we take all the images in the world where all the objects appear, then we usually will pick up very general patterns, and if, usually, rocks look like bears, then they might make those mistakes pretty easy.
I guess the challenge is that every photograph is an editorial decision of what to photograph and so every photograph reflects a human’s bias. So even if you had a hundred million photos, you’re still instantiating some kind of bias. 
So, people have this ability…we change focus. You look at something, and then a bear walks in the room, and you’re like, “Oh, my gosh! A bear walked in the room!” and then somebody yells, “Fire! Fire!” and you turn over to see where the fire is. So we’re always switching from thing to thing, and that seems to be a feature associated with our consciousness, that it allows us to switch. Does the fact that the computer is not embodied, it doesn’t have a form, and it doesn’t have consciousness, is that an inherent limitation to what it’s going to be able to see and recognize?
Yes. I think so. I mean, once again, if the computer doesn’t have any extra sensors, it wouldn’t even realize what’s going on, apart from the task that it’s actually executing. But let’s say that computer has a camera, it also has a tactile device, and many other things, then you’re starting to enable a little bit more context to that computer, or that program. I mean, if those events occur once in a while, then it would be able to react, or say something about it.
If you think about it, photographs that we use generally are the visible spectrum of light that humans are able to see, but that’s just a tiny fraction. Are there pattern recognition or image recognition efforts underway that are using a full spectrum of light and color? So they show infrared, ultraviolet…
Yes. Definitely. Yes.
Can you give an example of that? I find that fascinating.
Well, a very good example is self-driving cars. They have infrared cameras. They could potentially give you an idea of, “There is a body there, there is something there that is not animated,” so you don’t hit it when you’re driving. So, definitely, there are not just photographs, but for MRIs, all medical imaging, basically you use all that information that you can get.
Our audience is familiar, broadly, with machine learning, but can you talk a little more specifically about how, conceptually, “Here’s a million photos of a cat, learn what a cat looks like, and try to find the cat in these photos,” but peel that onion back one layer. How does that actually work, especially in a neural net situation?
Yeah, you basically tell the computer “there’s a cat in here.” At every single image, you’ll say “there’s a cat in here.” Sometimes you even label the contours of a cat, or even maybe just a rectangle around it, to differentiate it between the actual foreground and background. What the computer is going to do at the first level, it’s going to do very low-level operations, which means it’s going to start finding edges, connected components, all at the very low, granular level. So, it starts finding patterns at that level basically; that’s the first stage. And depending on how deep—let’s say if it’s a neural network—the neural network is, the higher the granularity of these patterns; they start getting more and more. So the representation of a cat starts from very low-level, until you start getting things like paws, and ears, and eyes, until you actually get to what it is a full cat at the end of the layers of this neural network. That’s where the layers of the neural network start encoding. So when you have a new picture where there’s not a cat—maybe there is a person—it’s going to try to find those patterns. And it’s going to be able to say, “Well, in this area of this image, there’s no cat because I don’t see all those patterns coming up the way I see it when a cat is present.”
And what are the inherent limitations of that approach, or is that the be-all and end-all of image recognition? Give it enough images, it will figure out how to recognize them almost flawlessly?
There are always limitations. There are objects that are easier to recognize than others. Now we have done amazing progress in even recognizing different types of dogs and different types of cats, which is amazing, but there are always constraints to lighting conditions, the quality of the image, to things that. You know, some dogs can look like cats, so, we can’t do anything about that. We always have constraints. I think that algorithms are not perfect, but depending on what we’re trying to use them for; they can get very accurate.
The same techniques are used, not just for training for images, but making credit decisions or hiring decisions or identifying illnesses—it’s all the basic same approach, correct?
What do you think of the efforts being considered in Europe, that dictate that you have a right to know why the algorithm suggested what it is? How do you reconcile that? For instance, you have denied a person’s mortgage application, and that person says, “Why?” and then you say, “The neural net said so.” And, of course, that person wants to know, “Well, why did it say so?” And it’s like, “Well, that’s a pretty hard question to answer.” How do you solve that, or how do you balance that? Because as we get better with neural nets, they’re only going to get more obfuscated and convoluted and nuanced, right?
I think the harder the problem, like say in the case of computer vision, it’s really hard to say what are the things that trigger a certain outcome. But luckily, you can still come up with algorithms that are simpler to train, but also simpler to figure out what are the main features that are triggering certain outcomes. And then you’ll be able to say to that person, if you pay your credit cards, then your score will improve and we’ll be able to give you a mortgage.
I think that’s the trade off, right? I think it’s always task-dependent. There is a lot of hype with deep learning and neural networks. Sometimes you just need a little bit more simple algorithms. They are still very accurate, but they can actually give you insights about your prediction, and also the data that you are looking at, and then you can actually build a better product. If your aim is to be extremely complete, or to try to solve a task that is very difficult, then you’re going to have to deal with the fact that there are a lot of things you won’t know about the data and also why the outcome of that algorithm came about.
Pedro Domingos wrote a book called The Master Algorithm where he said there are five different tribes, where he kind of divides all of that up. You have your symbolists, and you have your Bayesians, and so forth; and he posits that there must exist a master algorithm, a single general-purpose algorithm that would be able to solve a wide range of problems, in theory, all problems that are solvable that way. Do you think such a thing exists?
I don’t think it exists now. Given the fact that deep learning has been extremely useful across different type of tasks—going from computer vision, to even, like, music or signal processing, and things like that—there might be an algorithm that can help with a lot of different tasks, like a master algorithm, if you want to call it like that. But it will always be in some way modified to fit the actual problem that you want. Because of the fact that these algorithms are very complex, sometimes you actually need to know why the outcome is the outcome that you’re getting. So, yes, I think that algorithm might exist at some point. I don’t think it exists now. Deep learning, maybe, is one of the frameworks—because it’s not an algorithm but it’s more like a framework, or an architecture—that is helping to be able to accurately make predictions in different areas. But at the end of the day, we want to know why, because it will affect a lot of different people, at the end of the day.
One argument for the existence of such a thing—and that it may not be very much code—is human DNA, which is, of course, the instructions set to build a general intelligence. And the part of the DNA that makes you different than creatures that aren’t intelligent is very tiny. And so the argument goes that somehow a very little bit of code gives you the instructions to build us and we’re intelligent, so therefore there may be an analog in the computer world. That it’s just a small amount of code that can build something as versatile as a human. What do you think of that analogy?
Yeah, that’s mind blowing. That would be really cool if that happened, but at the same time, very scary. I never really thought about that before.
Do you think we’re going to build an artificial general intelligence? Will we build a computer as smart and versatile as a human?
This is very personal answer, humans are social beings and the only way that this could happen is that we’re alone, and we need something like a human to be with us. Hopefully, we’re very far from that future, but in the actual present, I don’t think that’s something that we aim to do.
I think it’s also more about, like, figuring out humanity by itself, like understanding why we come to be the way we are, why people are violent, why people are peaceful, why people are happy, or why people are sad. And that’s the best way of understanding that, like, basically, reconstructing a human brain and maybe extending that brain to have arms and become a robot. But I don’t think it would be the actual goal. It’s more like the way to understand humanity.
I also don’t think it would be a way of executing tasks. We always see in sci-fi movies that robots do things that humans don’t want to do, but they wouldn’t be humanoids. I was looking to buy a Roomba yesterday and that could possibly be a robot that is cleaning, it’s trying to do something that I don’t want to do, but I don’t consider it as being artificial intelligence or a smart being. So I think that it is in some way possible but I don’t think as an end to build something like a human.
Certainly not for a lot of things, but in some parts of the world, that is a real widespread goal. The idea being that places where you have aging populations, and you have lonely people, and you want robot companions that have faces that you recognize, and that display emotions and can listen to your stories, chuckle at your jokes, recognize your jokes, all of that. What are your thoughts on that? So, in those cases, they are trying to build artificial humans, aren’t some people?
But they won’t be complete humans, right? They will be machines that are very good at solving certain tasks, which is recognizing your voice, recognizing that you’re saying a joke, or being able to say things that make you feel better. But I don’t think that they are artificial humans, because that’s a very complex thing. That robot that is helping a senior person, for that person not to be alone, it won’t be able to do other much more complex tasks that a human can do.
I think it’s all about being very specific to solve very specific tasks.  And I think robots in Japan are doing that. I mean, we have smart assistance, right? And they are very good at understanding what you’re trying to say so they can execute a command, but I don’t think about them as another “human” that is trying to understand me, or actually know about who I am.
I don’t know if you saw that Tom Hanks movie years ago called Castaway, but his only companion is this soccer ball that he named Wilson, and then there’s a point where Wilson is floating off and he’s like, “Wilson!” And he risks his life to save Wilson. And then you look at how attached people get to their pets and their animals. And so, you can imagine, if you just kind of straight line graph that, how people might feel towards robots that really do look and act human. It’s undoubtable that people will develop strong emotions for them.
Yes, I agree with that.
So, it’s interesting, you’re talking about these digital systems, and some vendors choose to name them. Apple has Siri, Amazon has Alexa, Microsoft has Cortana, but Google, interestingly, doesn’t personify theirs. It’s the Google Assistant. Why do you think, and not necessarily those specific four cases, but why do you think sometimes they’re personified and sometimes they aren’t? What does that say about us, or them, or how we want to interact with them, or something like that?
That is very interesting, because sometimes when I have my son with me I’ll ask Alexa to play some music. Having a name makes it feel like it’s part of your family, and probably my son will wonder who this Alexa person is that I’m always asking to play 80s pop music. But it definitely makes you feel that it’s not awkward, that your interaction with the machine is smooth and it’s just not an execution, right? It’s part of your environment.
I think that’s what these companies are going for when they put a real name—it’s not a very common person’s name, but still a name that you could say. Alexa, probably, has a female voice because that’s, sort of, the gender that they’re aiming to represent. With respect to Google, I think maybe they want to see it in a more task-driven way. I don’t know. It could be many things.
I think I read that Alexa may have come from—in addition to having the hard “x,” which makes it sound distinctive—an oblique reference to The Library of Alexandria, way back in ancient times. 
Whenever my alarm goes off, I’m like, “Alexa, set me a five-minute timer”—which, luckily, it didn’t hear me—but when the timer goes off, I go, “Alexa, stop,” and it feels rude to me. Like, I don’t talk to people that way, and, therefore, it’s jarring to me. So, in a way I prefer not having it personified, because then I don’t have that conflict. But what you just said about your child, that they may not grow up having any of those sorts of mixed minds about these things. What do you think?
Yeah, I think that it’s true. Sometimes, I feel the same way like you do when you say, “stop,” and it feels like a very commanding way. With the next generation, you have iPads and computers are an old thing, almost; it’s all about new interfaces. It’s definitely going to shape the way that people communicate with machines and products. It’s very hard for me to know how that’s going to be, but it’s going to be very natural—the way that interactions will come with websites, products, gadgets, things like that.
I think that the fact that Google is still the “Google Assistant” also has to do with the fact that when you’re in a conversation, people don’t say Google a lot, right? So then you won’t trigger those devices to be listening all the time, which is another problem. But yeah, it’s very interesting. I always think about how the next generation is going to behave or how the experience is going to be for them, growing up with these devices.
The funny thing is, of course, because Google has become a verb, you could imagine a future in a hundred years or two hundred years when the company no longer exists, but we still use the word, and people are like, “I wonder why we say ‘google’? Where did that come from?” 
This is, in a sense, a personal question, but do you think a computer could ever feel anything? So, for example, you could put a sensor on a computer that detects temperature, and you can program the computer to play a wav file of a person screaming if it ever hits five hundred degrees, but that’s a different thing than actually feeling pain. Do you think it’s possible for a machine to feel, or is that something that’s purely biological, or purely related to life, or some other aspect of us?
I think that for a human to be able to feel something, that aspect of humanity, is such a complex thing. We know from biology that it’s mostly our nerves perceiving pain. They’re perceiving things and then sending that signal to the brain, and the brain is trying to interpret that information into something.
You could, if you want to be very analytical about it, then you could possibly have a computer that feels pain, like you said, something that can give input to the computer and then goes through the processor and the processor will infer a rule and it will say “this is pain.” I don’t think they can do it in the way that we, as humans, perceive it. It’s such a complex thing.
But in the end, we have a self. We have a self that experiences the world.
Can a computer have a self and can a computer, therefore experience the world, as opposed to just coldly sense it?
I think it’s really hard. Unless you can build a computer with cells and things that are more common to a human, which will be a really interesting thing. Personally, I don’t think that is possible, because even pain, like we’re talking about, is very different for everyone, because it’s mostly given by the experiences, right? And a computer can store a lot of information, but there’s much more than that signal, just the way that interpreting that data is what makes humans so interesting.
Humans, we have brains, and our brains do all the things, but we also have a theory of something called a “mind,” which, you know, “are you out of your mind?” And I guess we think of the mind as all the stuff that we don’t really understand how just a bunch of neurons can do, like creativity, emotions, and all of that. In that movie, iRobot, when Spooner, the Will Smith character, is talking to Sonny, the robot, he says, “Can you paint a painting? Can you write a symphony?” And of course, Sonny says, “Well, can you?” But, the point being, that all of those things are things we associate with the “mind.” Do you think computers will ever be able to write beautiful symphonies and bestselling novels, and blockbuster movies, and all of that? And if so, is machine learning a path to that? Like, how would you ever get enough movie scripts or even books or even stories to train it?
That’s interesting. I actually read that there is a movie that was written by a machine learning algorithm, the script actually, and they made a movie out of it. Now, is it good? I don’t know. So, it’s definitely possible. I think that, per se, computers cannot be creative. In  my experience, they’re basically looking at patterns of things that people find funny, or exciting, or makes them feel things.
You can say, “This song is very pleasing because it’s very slow and romantic and relaxing,” and then a computer could just take all those songs that are tagged that way and come up with a new song that has those specific patterns, that make that song relaxing, or pleasing, right? And you could say, “Yes, they are being creative,” because they created something new from something old, from patterns and previous examples. So, in that case, it’s happening already, or a lot of people are trying to make it happen.
Now, you could also argue that artists are the same way. They have their idols, and they somehow are going to try to take those things they like from their heroes, and incorporate them in their own work, and then they become creative, and they have their own creations, or their own art. A computer can actually do the same process.
I think humans are able to capture even more than a computer could ever capture, because a human is doing something for other humans, so they can actually understand the things that move people or make people feel sad or happy. Computers could also just catch the patterns that for certain people, for certain data that they have, produces those emotions but they will never feel those emotions like humans do.
There’s a lot of fear wrapped up in artificial intelligence machine learning with regards to automation. There are three broad beliefs. One is that we’re going to soon enter a period where there are people with not enough education or training to do certain jobs, and you’re going to have kind of the permanent Great Depression of twenty to twenty-five percent unemployment. Another group of people believes that eventually machines can do everything a person can do, like we’re all out of work. And then there’s a group of people who say, look, every time we get new technology, even things as fundamental as electricity and steam, and replace animals with machines; unemployment never goes up. People just use these new technologies to increase their productivity, and therefore their standard of living. Which of those three camps, or a fourth one, do you find yourself sympathetic to?
I think definitely the third one. I agree. A really good example; my dad studied technical drafting for architecture, and then there were computer programs that did that, and he didn’t have a job. He did it by hand, and then computers could do it easily. But then, he decided that he was really good at sales and that’s where his career started to develop. You know, you need to be personable, you need to be able to talk to people, engage them, sell them things, right?
I think that, in general, we are going to make people develop new skills they never thought they had. We are going to make them more efficient. For example, at Thumbtack, we’re empowering professionals to do the things that they’re really good at, and, you know, me, personally; it’s helping them through machine learning to optimize processes so they can be just focused on the things they love doing.
I don’t really like the fact that people say that AI, or machine learning, will take people’s jobs. I think we have to see it as a new wave of optimized processes that will actually give us more time to spend with our families or develop skills that we always thought would be interesting, or actually things that we love to do and we can make a job out of it. We can support our families by doing the things that we love, instead of being stuck in an office doing things that are super automatic, that you don’t put your heart, or even your mind to it. Let’s leave that to machines to automate it, and let’s just do something that makes our life better.
You mentioned Thumbtack, where you head up machine learning. Can you tell us a little bit about that? What excited you about that? What the mission of the company is, for people who aren’t familiar with it, and where you’re at in your life cycle?
So, Thumbtack is a marketplace where people like you and I can go and find a pro that’s going to do the right project for you. What’s really exciting is the fact that you don’t have to go to a listing, and call different places, and ask them “Are you interested?” When you go to Thumbtack, you put a request and only the pros, which are super qualified, that are interested will contact you back with a quote, and with information to tell you, “I’m ready to help you to get your project done.” And that’s it.
It’s amazing that we’re at 2017, and finding a plumber to fix your toilet, or even a DJ because you’re getting married, all those things are so hard. And what I really like by working at Thumbtack is that we are making that super easy for our customers. And we are empowering pros to be good at what they do, to just not have to be worried about putting out flyers or putting up a website, and spending all that time in marketing, and all these things, instead of helping people with their projects, and, for them, building their business.
It’s such a complex problem, but at the same time, it has such a good outcome for everyone, which, is one of the things that attracted me. And also the fact that we’re a startup, and startups are always a hard road, because we’re trying to disrupt a market that’s been untouched forever. And I think that’s a super challenging problem as well and being part of that is actually super exciting.
It’s true. It wasn’t that long ago when you needed a plumber, and you opened up the Yellow Pages, and you just saw how they were able to put a number of As in front of their names, AAA Plumbing, AAAA Plumbing, but that was how we figured things out. 
So, tell me a kind of machine learning challenge, a real day-to-day one that you have to deal with; what data do you use to solve what problem in that as you outlined it?
There are many different things. For some things, like automating some tasks that can make our team more productive, machine learning helps you to do that. For example, making sure that they can curate content. We get a lot of photos and reviews and things like that from our customers, and also content from our professionals, and we want to make sure that we’re showing all the things that are good for our customers, or surface information that is very relevant for them, when they’re looking to hire a professional.
There are also things, like, using information on our marketplace to enhance the experience of our users when they come to Thumbtack, and be able to recommend them another category, like, say they put a request for a DJ, and maybe if they are having a party they might also want a cleaning person the next day, right? Things like that. So, machine learning has always helped there to be able to use a lot of the data that we get from our marketplace, and make our product better.
All right. We’re nearing the end. I do have two questions for you. Do you enjoy any science fiction—like books or movies or any of that—and, if so, is there anything you’ve seen that you look at and think, yes, I could see the future unfolding that way, yes, that could really happen?
Yes, I definitely like science fiction. Foundation is one of the books that I really like.
Of course. That’s been one that’s resisted being able to be made into a movie, although I hear there’s one in the works for it, but that’s such a big project.
Yeah, I enjoy any type of science fiction, in general. I think it’s so interesting how humans see the future, right? It’s so creative. At the same time, I don’t particularly agree with any of those movies, and things like that. There are a lot of movies in Hollywood, too, where computers or robots become bad and they kill people.
I don’t think that’s the future we’ll see with machine learning. I think that we’ll be able to disrupt a lot of areas, and the one I’m most excited about is medicine, because that can really change the game in humanity by being able to accurately diagnose people with very few resources. In so many places in the world where there are no doctors, to be able to take a picture, or send a sample of something and having algorithms that can help doctors to get to that diagnosis quickly; that’s going to change the way that the world is today.
Gene Roddenberry, the creator of Star Trek, said, “In the future, there would be no hunger, and there would be no greed, and all the children would know how to read.” What do you think of that? Or, a broader question, because you are in the vanguard of this technology, you’re building these technologies that everybody reads about.  Are you optimistic about the future? How do you think it’s all going to turn out?
Actually, it feel like a renaissance in some ways. Always, after some renaissance, some big shift in culture, there’s always these new creative things happening. In the past, there were painters that revolutionized art by coming up with new ways of being creative, of painting.  So, my view of the future is that, yes, a lot of the basic needs of humans might be satisfied, which is great. Mortality probably is going to be very low. But also there is the opportunity for us to have enough time to be creative again, and think about new ways of living. Because we have that foundation, then people will be able to think long-term, be more wild about new ideas. I think that’s mostly how I see it.
That’s a great place to end it. I want to thank you so much for taking the time. It was a fascinating hour. Have a good day.
Sure. Thank you. Thank you for having me.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.