How Machines Learn: The Top Four Approaches to ML in Business

Machine learning sits at the forefront of innovation across a growing number of industries in today’s business world. Still, it’s a mistake to think of machine learning as one monolithic business solution — there are many forms of machine learning and each is capable of solving different sets of problems. The most popular forms of ML used in business today are supervised, unsupervised, semi-supervised, and reinforcement learning. At Vidora, we’ve used these techniques to help Fortune 500 partners solve some of their most pressing problems in innovative ways. This article draws from our experiences to demystify these four common approaches to ML, introducing practical applications of each technique so that anyone in your organization can recognize how machine learning can enhance your business.
Machine Learning at a Glance
Machine learning is an approach to Artificial Intelligence which borrows principles from computer science and statistics to model relationships in data. Unlike other AI systems which distill human knowledge into explicit rules (e.g. Expert Systems), ML instructs an algorithm to learn for itself by analyzing data. The more data it processes, the smarter the algorithm gets.
Machine learning is not a new concept. Its theoretical foundation was laid in the 1950s when Alan Turing conceptualized a “learning machine”. That same decade, Frank Rosenblatt invented the “perceptron” to roughly simulate the learning process of the brain. More algorithms followed, but machine learning remained largely confined to academia until only recently. With explosions in data availability and computational power, it is finally possible for businesses to deploy machine learning at scale. Organizations have had success with each type of learning, but making the right choice for your business problem requires an understanding of which conditions are best suited for each approach.
Supervised Learning
If you know which metric you’d like to predict and have examples labeled with that metric, supervised learning is the best approach. A supervised algorithm is shown the “right answer” for a set of sample data and finds a function which approximates the relationship between the inputs and outputs. This functional mapping takes the general form y = f(x) — specify your target output y, provide your inputs x, and the ML algorithm will learn the optimal f() by finding patterns in the data.

y = f(x)
Description Training Phase Live Model
y Output Supplied Predicted
x Input Supplied Supplied
f() Functional mapping Learned Used to generate predictions

Supervised learning outputs typically have one of two forms. Regression outputs are real-valued numbers that exist in a continuous space. For instance, many of Vidora’s eCommerce customers want to forecast how much money each customer is likely to spend, so that high-value customer may be targeted with personalized promotional offers. A simple linear regression structures this problem through the familiar formula y = mx + b, where y is predicted expenditure and x is some attribute of each customer — say, number of site visits. During training, we supply labeled input-output pairs — i.e. customers for which transaction history is already known — and the algorithm finds the optimal parameters m and b to make this relationship as accurate as possible. In reality, Vidora’s regression model is likely to input hundreds of customer attributes each with its own parameter, but the algorithm’s mechanism of action remains the same.
Classification outputs, on the other hand, fall into discrete categories. For example, Vidora’s subscription customers often wish to identify the best communication channel to reach and retain each user: email or push notification. A linear classification algorithm distinguishes between the two by plotting attributes of each user and finding a line which separates the data into two groups based on their labels. Users known to be responsive to email fall on one side of the line, and those responsive to push fall on the other.
Popular supervised learning algorithms:


  • Linear regression
  • Random forest
  • Multi-layer perceptron
  • Convolutional deep neural networks

  • Logistic regression
  • Support vector machines
  • Convolutional deep neural networks
  • Naive Bayes

Unsupervised Learning
Unsupervised learning is used when training data has no specific label for the algorithm to predict. Without “right answers” to train on, the job of an unsupervised algorithm becomes clustering the data in order to uncover new rules and patterns. Finding inherent structures in the data can yield important and practical insights, from detecting data anomalies that mark credit card fraud, to revealing what your best customers have in common.
Popular unsupervised learning algorithms:

  • K-means clustering
  • Principal component analysis
  • Non-negative matrix factorization
  • Hidden Markov model
  • Hebbian learning
  • Autoencoders

Semi-supervised Learning
At Vidora, we’ve seen that collecting labeled data at scale is a challenge for many business organizations, but unlabeled data is relatively abundant. Semi-supervised learning makes use of this plentiful unlabeled data to gain a better understanding of the population structure and distribution. For instance, a bank which offers home loans may wish to identify which of its customers own a house, but may have limited access to this information. Under the semi-supervised approach, an algorithm would first use information obtained from labeled data to predict homeownership for unlabeled data. Next, both the labeled and predicted data are passed through a supervised framework to learn a homeowner identification model. Despite never being evaluated, the estimated labels may improve performance of the supervised model by providing a larger set of potential homeowners from which the algorithm can learn.
Popular semi-supervised learning algorithms:

  • PU classification
  • Transductive SVM
  • Co-training

Reinforcement Learning
Reinforcement learning is used in situations where the computer is an agent interacting with its environment in pursuit of a goal. Here, feedback is the key ingredient. Rather than being shown a “right answer”, the algorithm is provided a reward signal against which it evaluates and adjusts its methods. With experience, the algorithm learns which sequence of actions gives it the best chance of maximizing its reward and achieving its goal.
Reinforcement learning typically requires huge amounts of data, but doesn’t force your business to be highly specific about its goals. Some autonomous vehicles learn to drive through reinforcement. These cars are instructed to get from point A to point B under only two broad conditions: obey the rules of the road, and don’t crash. The rest is learned through trial and error. Google’s famed AlphaGo program also learned to play the ancient Chinese board game Go using reinforcement. Armed with only the game’s rules and a goal of winning, AlphaGo learned which moves tended to maximize its chance of success. Merely two years after making its first move, AlphaGo famously dethroned the Go world champion in 2016.
Popular reinforcement learning algorithms:

  • Q-learning
  • Temporal difference
  • Monte Carlo tree search
  • Sarsa

ML and Your Business
Each of supervised, unsupervised, semi-supervised, and reinforcement learning has shown meaningful success in the business world. As the practical scope of machine learning broadens, fluency in its key concepts becomes an increasingly important business skill even for those with no data science experience. Recognizing which sorts of problems each ML approach is best-equipped to solve empowers business experts to recognize where the technology may make its greatest contributions to key business outcomes.
Michael Firn is a Product Manager at Vidora, where he works closely with both Vidora’s engineering team and Vidora’s Fortune 500 partners such as News Corp, Walmart and Time to help develop and implement machine learning solutions to their business problems. 

Voices in AI – Episode 29: A Conversation with Hugo LaRochelle

In this episode, Byron and Hugo discuss consciousness, machine learning and more.
[podcast_player name=”Episode 29 – A Conversation with Hugo LaRochelle” artist=”Byron Reese” album=”Voices in AI” url=”″ cover_art_url=””]
Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. Today I’m excited; our guest is Hugo Larochelle. He is a research scientist over at Google Brain. That would be enough to say about him to start with, but there’s a whole lot more we can go into. He’s an Associate Professor, on leave presently. He’s an expert on machine learning, and he specializes in deep neural networks in the areas of computer vision and natural language processing. Welcome to the show, Hugo.
Hugo Larochelle: Hi. Thanks for having me.
I’m going to ask you only one, kind of, lead-in question, and then let’s dive in. Would you give people a quick overview, a hierarchical explanation of the various terms that I just used in there? In terms of, what is “machine learning,” and then what are “neural nets” specifically as a subset of that? And what is “deep learning” in relation to that? Can you put all of that into perspective for the listener?
Sure, let me try that. Machine learning is the field in computer science, and in AI, where we are interested in designing algorithms or procedures that allow machines to learn. And this is motivated by the fact that we would like machines to be able to accumulate knowledge in an automatic way, as opposed to another approach which is to just hand-code knowledge into a machine. That’s machine learning, and there are a variety of different approaches for allowing for a machine to learn about the world, to learn about achieving certain tasks.
Within machine learning, there is one approach that is based on artificial neural networks. That approach is more closely inspired from our brains, from real neural networks and real neurons. It is still somewhat vaguely inspired by—in the sense that many of these algorithms probably aren’t close to what real biological neurons are doing—but some of the inspiration for it, I guess, is a lot of people in machine learning, and specifically in deep learning, have this perspective that the brain is really a biological machine. That it is executing some algorithm, and would like to discover what this algorithm is. And so, we try to take inspiration from the way the brain functions in designing our own artificial neural networks, but also take into account how machines work and how they’re different from biological neurons.
There’s the fundamental unit of computation in artificial neural networks, which is this artificial neuron. You can think of it, for instance, that we have neurons that are connected to our retina. And so, on a machine, we’d have a neuron that would be connected to, and take as input, the pixel values of some image on a computer. And in artificial neural networks, for the longest of time, we would have such neural networks with mostly a single layer of these neurons—so multiple neurons trying to detect different patterns in, say, images—and that was the most sophisticated type of artificial neural networks that we could really train with success, say ten years ago or more, with some exceptions. But in the past ten years or so, there’s been development in designing learning algorithms that leverage so called deep neural networks that have many more of these layers of neurons. Much like, in our brain we have a variety of brain regions that are connected with one another. How the light, say, flows in our visual cortex, it flows from the retina to various regions in the visceral cortex. In the past ten years there’s been a lot of success in designing more and more successful learning algorithms that are based on these artificial neural networks with many layers of artificial neurons. And that’s been something I’ve been doing research on for the past ten years now.
You just touched on something interesting, which is this parallel between biology and human intelligence. The human genome is like 725MB, but so much of it we share with plants and other life on this planet. If you look at the part that’s uniquely human, it’s probably 10MB or something. Does that imply to you that you can actually create an AGI, an artificial general intelligence, with as little as 10MB of code if we just knew what that 10MB would look like? Or more precisely, with 10MB of code could you create something that could in turn learn to become an AGI?
Perhaps we can make that parallel. I’m not so much an expert on biology to be able to make a specific statement like that. But I guess in the way I approach research—beyond just looking at the fact that we are intelligent beings and our intelligence is essentially from our brain, and beyond just taking some inspiration from the brain—I mostly drive my research on designing learning algorithms more from math or statistics. Trying to think about what might be a reasonable approach for this or that problem, and how could I potentially implement it with something that looks like an artificial neural network. I’m sure some people have a better-informed opinion as to what extent we can draw a direct inspiration from biology, but beyond just the very high-level inspiration that I just described, what motivates my work and my approach to research is a bit more taking inspiration from math and statistics.
Do you begin with a definition of what you think intelligence is? And if so, how do you define intelligence?
That’s a very good question. There are two schools of thought, at least in terms of thinking of what we want to achieve. There’s one which is we want to somehow reach the closest thing to perfect rationality. And there’s another one which is to just achieve an intelligence that’s comparable to that of human beings, in the sense that, as humans perhaps we wouldn’t really draw a difference between a computer or another person, say, in talking with that machine or in looking at its ability to achieve a specific task.
A lot of machine learning really is based on imitating humans. In the sense that, we collect data, and this data, if it’s labeled, it’s usually produced by another person or committee of persons, like crowd workers. I think those two definitions aren’t incompatible, and it seems the common denominator is essentially a form of computation that isn’t otherwise easily encoded just by writing code yourself.
At the same time, what’s kind of interesting—and perhaps evidence that this notion of intelligence is elusive—is there’s this well-known phenomenon that we call the AI effect, which is that it seems very often whenever we reach a new level of AI achievement, of AI performance for a given task, it doesn’t take a whole lot of time before we start saying that this actually wasn’t AI, but this other new problem that we are now interested in is AI. Chess is a little bit like that. For a long time, people would associate chess playing as a form of intelligence. But once we figured out that we can be pretty good by treating it as, essentially, a tree search procedure, then some people would start saying, “Well that’s not really AI.” There’s now this new separation where chess-playing is not AI anymore, somehow. So, it’s a very tough thing to pin down. Currently, I would say, whenever I’m thinking of AI tasks, a lot of it is essentially matching human performance on some particular task.
Such as the Turing Test. It’s much derided, of course, but do you think there’s any value in it as a benchmark of any kind? Or is it just a glorified party trick when we finally do it? And to your point, that’s not really intelligence either.
No, I think there’s value to that, in the sense that, at the very least, if we define a specific Turing Test for which we currently have no solution, I think it is valuable to try to then succeed in that Turing Test. I think it does have some value.
There are certainly situations where humans can also do other things. So, arguably, you could say that if someone plays against AlphaGo, but wasn’t initially told if it was AlphaGo or not—though, interestingly, some people have argued it’s using strategies that the best Go players aren’t necessarily considering naturally—you could argue that right now if you played against AlphaGo you would have a hard time determining that this isn’t just some Go expert, at least many people wouldn’t be able to say that. But, of course, AlphaGo doesn’t really classify natural images, or it doesn’t dialog with a person. But still, I would certainly argue that trying to tackle that particular milestone is useful in our scientific endeavor towards more and more intelligent machines.
Isn’t it fascinating that Turing said that, assuming the listeners are familiar with it, it’s basically, “Can you tell if this is a machine or a person you’re talking to over a computer?” And Turing said that if it can fool you thirty percent of the time, we have to say it’s smart. And the first thing you say, well why isn’t it fifty percent? Why isn’t it, kind of, indistinguishable? An answer to that would probably be something like, “Well, we’re not saying that it’s as smart as a human, but it’s intelligent. You have to say it’s intelligent if it can fool people regularly.” But the interesting thing is that if it can ever fool people more than fifty percent, the only conclusion you can draw is that it’s better at being human than we are…or seeming human.
Well definitely that’s a good point. I definitely think that intelligence isn’t a black or white phenomenon, in terms of something is intelligent or isn’t, it’s definitely a spectrum. What it means for someone to fool a human more than actual humans into thinking that they’re human is an interesting thing to think about. I guess I’m not sure we’re really quite there yet, and if we were there then this might just be more like a bug in the evaluation itself. In the sense that, presumably, much like we have now adversarial networks or adversarial examples, so we have methods that can fool a particular test. I guess it just might be more a reflection of that. But yeah, intelligence I think is a spectrum, and I wouldn’t be comfortable trying to pin it down to a specific frontier or barrier that we have to reach before we can say we have achieved actual AI.
To say we’re not quite there yet, that is an exercise in understatement, right? Because I can’t find a single one of these systems that are trying to pass the test that can answer the following question, “What’s bigger, a nickel or the sun?” So, I need four seconds to instantly know. Even the best contests restrict the questions enormously. They try to tilt everything in favor of the machine. The machine can’t even put in a showing. What do you infer from that, that we are so far away?
I think that’s a very good point. And it’s interesting, I think, to talk about how quickly are we progressing towards something that would be indistinguishable from human intelligence—or any other—in the very complete Turing Test type of meaning. I think that what you’re getting at is that we’re getting pretty good at a surprising number of individual tasks, but for something to solve all of them at once, and be very flexible and capable in a more general way, essentially your example shows that we’re quite far from that. So, I do find myself thinking, “Okay, how far are we, do we think?” And often, if you talk to someone who isn’t in machine learning or in AI, that’s often the question they ask, “How far away are we from AIs doing pretty much anything we’re able to do?” And it’s a very difficult thing to predict. So usually what I say is that I don’t know because you would need to predict the future for that.
One bit of information that I feel we don’t often go back to is, if you look at some of the quotes of AI researchers when people were, like now, very excited about the prospect of AI, a lot of these quotes are actually similar to some of the things we hear today. So, knowing this, and noticing that it’s not hard to think of a particular reasoning task where we don’t really have anything that would solve it as easily as we might have thought—I think it just suggests that we still have a fairly long way in terms of a real general AI.
Well let’s talk about that for just a second. Just now you talked about the pitfalls of predicting the future, but if I said, “How long will it be before we get to Mars?” that’s a future question, but it’s answerable. You could say, “Well, rocket technology and…blah, blah, blah…2020 to 2040,” or something like that. But if you ask people who are in this field—at least tangentially in the field—you get answers between five and five hundred years. And so that implies to me that not only do we not know when we’re going to do it, we really don’t know how to build an AGI.  
So, I guess my question is twofold. One, why do you think there is that range? And two, do you think that, whether or not you can predict the time, do you think we have all of the tools in our arsenal that we need to build an AGI? Do you believe that with sufficient advances in algorithms, sufficient advances in processors, with data collection, etcetera, do you think we are on a linear path to achieve an AGI? Or is an AGI going to require some hitherto unimaginable breakthrough? And that’s why you get five to five hundred years because that’s the thing that’s kind of the black swan in the room?
That is my suspicion, that there are at least one and probably many technological breakthroughs—that aren’t just computers getting faster or collecting more data—that are required. One example, which I feel is not so much an issue with compute power, but is much more an issue of, “Okay, we don’t have the right procedure, we don’t have the right algorithms,” is being able to match how as humans we’re able to learn certain concepts with very little, quote unquote, data or human experience. An example that’s often given is if you show me a few pictures of an object, I will probably recognize that same object in many more pictures, just from a few—perhaps just one—photographs of that object. If you show me a picture of a family member and you show me other pictures of your family, I will probably identify that person without you having to tell me more than once. And there are many other things that we’re able to learn from very little feedback.
I don’t think that’s just a matter of throwing existing technology, more computers and more data, at it; I suspect that there are algorithmic components that are missing. One of them might be—and it’s something I’m very interested in right now—learning to learn, or meta-learning. So, essentially, producing learning algorithms from examples of tasks, and, more generally, just having a higher-level perspective of what learning is. Acknowledging that it works on various scales, and that there are a lot of different learning procedures happening in parallel and in intricate ways. And so, determining how these learning processes should act at various scales, I think, is probably a question we’ll need to tackle more and actually find a solution for.
There are people who think that we’re not going to build an AGI until we understand consciousness. That consciousness is this unique ability we have to change focus, and to observe the world a certain way and to experience the world a certain way that gives us these insights. So, I would throw that to you. Do you, A), believe that consciousness is somehow key to human intelligence; and, B), do you think we’ll make a conscious computer?
That’s a very interesting question. I haven’t really wrapped my head around what is consciousness relative to the concept of building an artificial intelligence. It’s a very interesting conversation to have, but I really have no clue, no handle on how to think about that.
I would say, however, that clearly notions of attention, for instance, being able to focus attention on various things or adding an ability to seek information, those are clearly components for which there’s, currently—I guess for attention we have some fairly mature solutions which work, thought in somewhat restrictive ways and not in the more general way; information seeking, I think, is still very much related to the notion of exploration and reinforcement learning—still a very big technical challenge that we need to address.
So, some of these aspects of our consciousness, I think, are kind of procedural, and we will need to figure out some algorithm to implement these, or learn to extract these behaviors from experience and from data.
You talked a little bit earlier about learning from just a little bit of data, that we’re really good at that. Is that, do you think, an example of humans being good at unsupervised learning? Because obviously as kids you learn, “This is a dog, and this is a cat,” and that’s supervised learning. But what you were talking about, was, “Now I can recognize it in low light, I can recognize it from behind, I can recognize it at a distance.” Is that humans doing a kind of unsupervised learning? Maybe start off by just explaining the concept and the hope about unsupervised learning, that it takes us, maybe, out of the process. And then, do you think humans are good at that?
I guess, unsupervised learning is, by definition, something that’s not supervised learning. It’s kind of an extreme of not using supervised learning. An example of that would be—and this is something I investigated quite a bit when I did my PhD ten years ago—to have a procedure, a learning algorithm, that can, for instance, look at images of hundreds of characters and be able to understand that each of these pixels in these images of characters are related. That they are higher-level concepts that explain why this is a digit. For instance, there is the concept of pen strokes; a character is really a combination of pen strokes. So, unsupervised learning would try to—just from looking at images, from the fact that there are correlations between these pixels, that they tend to look like something different than just a random image, and that pixels arrange themselves in a very specific way compared to any random combination of pixels—be able to extract these higher-level concepts like pen stroke and handwritten characters. In a more complex, natural scene this would be identifying the different objects without someone having to label each object. Because really what explains what I’m seeing is that there’s a few different objects with a particular light interacting with the scene and so on.
That’s something that I’ve looked at quite a bit, and I do think that humans are doing some form of that. But also, we’re, probably as infants, we’re interacting with our world and we’re exploring it and we’re being curious. And that starts being something a bit further away from just pure unsupervised learning and a bit closer to things like our reinforcement learning. So, this notion that I can actually manipulate my environment, and from this I can learn what are its properties, what are the facts and the variations that characterize this environment?
And there’s an even more supervised type of learning that we see in ourselves as infants that is not really captured by purely supervised learning, which is being able to exchange or to learn from feedback from another person. So, we might imitate someone, and that would be closer to supervised learning, but we might instead get feedback that’s worded. So, if a parent says do this or don’t do that, this isn’t exactly an imitation this is more like a communication of how you should adjust your behavior. And this is a form of weakly supervised learning. So, if I tell my kid to do his or her homework, or if I give instructions on how to solve a particular problem set, this isn’t a demonstration, so this isn’t supervised learning. This is more like a weak form of supervised learning. Which even then I think we don’t use as much in the known systems that work well currently that people are using in object recognition systems or machine translation systems and so on. And so, I believe that these various forms of learning that are much less supervised than the common supervised learning is a direction in research where we still have a lot of progress to make.
So earlier you were talking about meta learning, which is learning how to learn, and I think there’s been a wide range of views about how artificial intelligence and an AGI might work. And on one side was an early hope that, like the physical universe which is governed just by very few laws, and magnetism very few laws, electricity very few laws, we hoped that intelligence was governed by just a very few laws that we could learn. And then on the other extreme you have people like the late Marvin Minsky who really saw the brain as a hack of a couple of hundred narrow AIs, that all come together and give us, if not a general intelligence at least a really good substitute for one. I guess a belief in meta learning is a belief in the former case, or something like it, that there is a way to learn how to learn. There’s a way to build all those hacks. Would you agree? Do you think that?
We can take one example there. I think under a somewhat general definition of what learning to learn or meta learning is, it’s something that we could all agree exists, which is, as humans, we’re the result of years of evolution. And evolution is a form of adaptation, I guess. But then within our lifespan, each individual will also adapt to its specific human experience. So, you can think of evolution as being kind of like the meta learning to the learning that we do as humans in our individual lives every day. But then even in our own lives, I think there are clearly ways in which my brain is adapting as I’m growing older from a baby to an adult, that are not conscious. There are ways in which I’m adapting in a rational way, in conscious ways, which rely on the fact that my brain has adapted to be able to perceive my environment—my visual cortex just maturing. So again, there are multiple layers of learning that rely on each other. And so, I think this is, at a fairly high level, but I think in a meaningful way, a form of meta learning. For that reason, I think that investigating how to have learning of learning systems is that there is a process that’s valuable here in informing how to have more intelligent agents and AIs.
There’s a lot of fear wrapped up in the media coverage of artificial intelligence. And not even getting into killer robots, just the effects that it’s going to have on jobs and employment. Do you share that? And what is your prognosis for the future? Is AI in the end going to increase human productivity like all other technologies have done, or is AI something profoundly different that’s going to harm humans?
That’s a good question. What I can say is that I am motivated by—and what makes me excited about AI—is that I see it as an opportunity of automating parts of my day-to-day life which I would rather be automated so I can spend my life doing more creative things, or the things that I’m more passionate about or more interested in. I think largely because of that, I see AI as a wonderful piece of technology for humanity. I see benefits in terms of better machine translation which will better connect the different parts of the world and allow us to travel and learn about other cultures. Or how I can automate the work of certain health workers so that they can spend more time on the harder cases that probably don’t receive as much attention as they should.
For that reason—and because I’m personally motivated automating these aspects of life which we would want to see automated—I am fairly optimistic about the prospects for our society to have more AI. And, potentially, when it comes to jobs we can even imagine automating our ability to progress professionally. Definitely there’s a lot of opportunities in automating part of the process of learning in a course. We now have many courses online. Even myself when I was teaching, I was putting a lot of material on YouTube to allow for people to learn.
Essentially, I identified that the day-to-day teaching that I was doing in my job was very repetitive. It was something that I could record once and for all and instead focus my attention on spending time with the student and making sure that each individual student solves its own misunderstanding about the topic. Because my mental model of the student in general is that it’s often unpredictable how they will misunderstand a particular aspect of the course. And so, you actually want to spend some time interacting with that student, and you want to do that with as many students as possible. I think that’s an example where we can think of automating particular aspects of education so as to support our ability to have everyone be educated and be able to have a meaningful professional life. So, I’m overall optimistic, largely because of the way I see myself using AI and developing AI in the future.
Anybody who’s listened to many episodes of the show will know I’m very sympathetic to that position. I think it’s easy to point to history and say in the last two hundred and fifty years, other than the depression which wasn’t caused by technology obviously, unemployment has been between five and nine percent without fail. And yet, we’ve had incredibly disruptive technologies, like the mechanization of industry, the replacement of animal power with human power, electrification, and so forth. And in every case, humans have used those technologies to increase their own productivity and therefore their incomes. And that is the entire story of the rising standard of living for everybody, at least in the western world.
But I would be remiss not to make the other case, which is that there might be a point, an escape velocity, where a machine can learn a new job faster than a human. And at that point, at that magic moment, every new job, everything we create, a machine would learn it faster than a human. Such that, literally, everything from Michael Crichton down to…everybody—everybody finds themselves replaced. Is that possible? And if that really happened, would that be a bad thing?
That’s a very good question I think for society in general. Maybe because my day-to-day is about identifying what are the current challenges in making progress in AI, I see—and I guess we’ve touched that a little bit earlier—that there are still many scientific challenges, that it doesn’t seem like it’s just a matter of making computers faster and collecting more data. Because I see these many challenges, and because I’ve seen that the scientific community, in previous years, has been wrong and has been overly optimistic, I tend to err on the side of less gloomy and a bit more conservative in how quickly we’ll get there, if we ever get there.
In terms of what it means for society—if that was to ever happen that we can automate essentially most things—I unfortunately feel ill-equipped as a non-economist to be able to really have a meaningful opinion about this. But I do think it’s good that we have a dialog about it, as long as it’s grounded in facts. Which is why it’s a difficult question to discuss, because we’re talking about a hypothetical future that might not exist before a very long time. But as long as we have, otherwise, a rational discussion about what might happen, I don’t see a reason not to have that discussion.
It’s funny. Probably the truest thing that I’ve learned from doing all of these chats is that there is a direct correlation between how much you code and how far away you think an AGI is.
That’s quite possible.
I could even go further to say that the longer you have coded, the further away you think it is. People who are new at it are like, “Yeah. We’ll knock this out.” And the other people who think it’s going to happen really quickly are more observers. So, I want to throw a thought experiment to you.
It’s a thought experiment that I haven’t presented to anybody on the show yet. It’s by a man named Frank Jackson, and it’s the problem of Mary, and the problem goes like this. There’s this hypothetical person, Mary, and Mary knows everything in the world about color. Everything is an understatement. She has a god-like understanding of color, everything down to the basic, most minute detail of light and neurons and everything. And the rub is that she lives in a room that she’s never left, and everything she’s seen is black and white. And one day she goes outside and she sees red for the first time. And the question is, does she learn anything new when that happens that she didn’t know before? Do you have an initial reaction to that?
My initial reaction is that, being colorblind I might be ill-equipped to answer that question. But seriously, so she has a perfect understanding of color but—just restating the situation—she has only seen in black and white?
Correct. And then one day she sees color. Did she learn anything new about color?
By definition of what understanding means, I would think that she wouldn’t learn anything about color. About red specifically.
Right. That is probably the consistent answer, but it’s one that is intuitively unsatisfying to many people. The question it’s trying to get at is, is experiencing something different than knowing something? And if in fact it is different, then we have to build a machine that can experience things for it to truly be intelligent, as opposed to just knowing something. And to experience things means you return to this thorny issue of consciousness. We are not only the most intelligent creature on the planet, but we’re arguably the most conscious. And that those two things somehow are tied together. And I just keep returning to that because it implies, maybe, you can write all the code in the world, and until the machine can experience something… But the way you just answered the question was, no, if you know everything, experiencing adds nothing.
I guess, unless that experience would somehow contradict what you know about the world, I would think that it wouldn’t affect it. And this is partly, I think, one challenge about developing AI as we move forward. A lot of the AIs that we’ve successfully developed that have to do with performing a series of actions, like playing Go for instance, have really been developed in a simulated environment. In this case, for a board game, it’s pretty easy to simulate it on a computer because you can literally write all the rules of the game so you can put them in the computer and simulate it.
But, for an experience such as being in the real world and manipulating objects, as long as that simulated experience isn’t exactly what the experience is in the real world, touching real objects, I think we will face a challenge in transferring any kind of intelligence that we grow in simulations, and transfer it to the real world. And this partly relates to our inability to have algorithms that learn rapidly. Instead, they require millions of repetitions or examples to really be close to what humans can do. Imagine having a robot go through millions of labeled examples from someone manipulating that robot, and showing it exactly how to do everything. That robot might essentially learn too slowly to really learn any meaningful behavior in a reasonable amount of time.
You used the word transfer three or four times there. Do you think that transfer learning, this idea that humans are really good at taking what we know in one domain space and applying it in another—you know, you walk around one big city and go to a different big city and you kind of map things. Is that a useful thing to work on in artificial intelligence?
Absolutely. In fact, we’re seeing that with all the success that has been enabled by the ImageNet data set and the competition. It turns out if you train an object recognition system on this large ImageNet data set, it really is responsible for the revolution of deep neural nets and convolutional neural nets in the field of computer vision. It turns out that these models trained on that source of data could transfer really well to a surprising number of paths. And that has very much enabled a kind of a revolution in computer vision. But it’s a fairly simple type of transfer, and I think there are more subtle ways of transferring, where you need to take what you knew before but slightly adjust it. How to do to that without forgetting what you learned before? So, understanding how these different mechanisms need to work together to be able to perform a form of lifelong learning, of being able to accumulate one task after another, and learning each new task with less and less experience, is something I think currently we’re not doing as well as we need to.
What keeps you up at night? You meet a genie and you rub the bottle and the genie comes out and says, “I will give you perfect understanding of something.” What do you wrestle with that maybe you can phrase in a way that would be useful to the listeners?
Let’s see. That’s a very good question. Definitely, in my daily research, how are we able to accumulate knowledge, and how would a machine accumulate knowledge, in a very long period, and learn the sequence of tasks and abilities in a sequence, cumulatively, is something that I think a whole lot about. And this has led me to think about learning to learn, because I suspect that there are ideas. And effectively once you have to learn one ability after the other after the other, that process of doing this and doing it better, the fact that we do it better is, perhaps, because we are learning how to learn each task also. That there’s this other scale of learning that is going on. How to do this exactly I don’t quite know, and knowing this I think would be a pretty big step in our field.
I have three final questions, if I could. You’re in Canada, correct?
As it turns out, I’m currently still in the US because I have four kids, two of them are in school so I wanted them to finish their school year before we move. But the plan is for me to go to Montreal, yes.
I noticed something. There’s a lot of AI activity in Canada, a lot of leading research. How did that come about? Was that a deliberate decision or just a kind of a coincidence that different universities and businesses decided to go into that?
If I speak for Montreal specifically, very clearly at the source of it is Yoshua Bengio deciding to stay in Montreal, staying in academia, and then continuing to train many students, gathering other researchers that are also in his group, and also training more PhDs in the field that doesn’t have as much talent as is needed. I think this is essentially the source of it.
And then my second to the last question is, what about science fiction? Do you enjoy it in any form, like movies or TV or books or anything like that? And if so, is there any that you look at it and think, “Ah, the future could happen that way”?
I definitely used to be more into science fiction. Now maybe due to having kids I watch many more Disney movies than I watch science fiction. It’s actually a good question. I’m realizing I haven’t watched a sci-fi movie for a bit, but it would be interesting, now that I’ve actually been in this field for a while, to sort of confront my vision of it from how artists potentially see AI. Maybe not seriously. A lot of art is essentially philosophy around what could happen, or at least projecting a potential future and seeing how we feel about it. And for that purpose, I’m now tempted to revisit either some classics or seeing what are recent sci-fi movies.
I said only one more question, so I’ve got to combine two into one to stick with that. What are you working on, and if a listener is going into college or is presently in college and wants to get into artificial intelligence in a way that is really relevant, what would be a leading edge that you would say somebody entering the field now would do well to invest time in? So first, you, and then what would you recommend for the next generation of AI researchers?
As I’ve mentioned, perhaps not so surprisingly, I am very much interested in learning to learn and meta learning. I’ve started publishing on the subject, and I’m still very much thinking around various new ideas for meta learning approaches. And also learning from, yes, weaker signals than in the supervised learning setting. Such as learning from worded feedback from a person is something I haven’t quite started working on specifically, but I’m thinking a whole lot about these days. Perhaps those are directions that I would definitely encourage other young researchers to think about and study and research.
And in terms of advice, well, I’m obviously biased, and being in Montreal studying deep learning and AI, currently, is a very, very rich and great experience. There are a lot of people to talk to, to interact with, not just in academia but now much more in industry, such as ourselves with Google and other places. And also, being very active online. On Twitter, there’s now a very, very rich community of people sharing the work of others and discussing the latest results. The field is moving very fast, and in large part it’s because the deep learning community has been very open about sharing its latest results, and also making the discussion open about what’s going on. So be connected, whether it be on Twitter or other social networks, and read papers and look at what comes up on archives—engage in the global conversation.
Alright. Well that’s a great place to end. I want to thank you so much. This has been a fascinating hour, and I would love to have you come back and talk about your other work in the future if you’d be up for it.
Of course, yeah. Thank you for having me.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.

Voices in AI – Episode 1: A Conversation with Yoshua Bengio

In this episode Byron and Yoshua talk about knowledge, unsupervised learning, how the brain learns, creativity, and machine translation.
[podcast_player name=”Episode 1: A Conversation with Yoshua Bengio” artist=”Byron Reese” album=”Voices in AI” url=”″ cover_art_url=””]


Yoshua Bengio received a PhD in Computer Science from McGill University in Canada in 1991. After two post-doctoral years, one at MITand one at AT&T Bell Labs, he became professor at the Department of Computer Science and Operations Research at the University of Montreal. He is the author of two books and more than 200 publications. The most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been Associate Editor of the top journals in machine learning and neural networks.


Byron Reese: This is Voice in AI, brought to you by Gigaom. I’m Byron Reese. Today our guest is Yoshua Bengio. Yoshua Bengio received a PhD in Computer Science from McGill University in Canada in 1991. After two post-doctoral years, one at MIT and one at AT&T Bell Labs, he became professor at the Department of Computer Science and Operations Research at the University of Montreal. He is the author of two books and more than two hundred publications. The most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been Associate Editor of the top journals in machine learning and neural networks. Welcome to the show, Yoshua.
Yoshua Bengio: Thank you.
So, let’s begin. When people ask you, “What is artificial intelligence,” how do you answer that?
Artificial intelligence is looking for building machines that are intelligent, that can do things that humans can do, and for doing that it needs to have knowledge about the world and then be able to use that knowledge to do useful things.
And it’s kind of kicking the can down the street just a little bit, because there’s unfortunately no consensus definition of what intelligence is either, but it sounds like the way you describe it, it’s just kind of like doing complicated things. So, it doesn’t have an aspect of, you know, it has to respond to its environment or anything like that?
Not necessarily. You could imagine having a very, very intelligent search engine that understands all kinds of things but doesn’t really have a body, doesn’t really live in an environment other than the interactions with people through the queries. So, the kinds of intelligence, of course, that we know and we think about when we think about animals are involving movement and actions and so on. But, yeah. Intelligence could be of different forms and it could be about different aspects. So, a mouse is very intelligent in its environment. If you or I went into the head of the mouse and tried to control the muscles of the mouse and survive, we probably wouldn’t last very long. And if you understand my definition, which is about knowledge, you could know a lot of things in some areas so you could be very intelligent in some area and know very little about another area and so not be very intelligent in other areas.
And how would you describe the state of the art? Where are we with artificial intelligence?
Well, we’ve made huge progress in the ability of computers to perceive better, to understand images, sounds and even to some extent language. But, we’re still very far from machines that can discover autonomously how the world around us works. We’re still very far from machines who can understand sort of high-level concepts that we typically manipulate with language. So, there’s a lot to do.
And, yeah, it’s true. Like, if you go to any of the bots that have been running for years, the ones that people built to maybe try to pass the Turing test or something, if you just start off by asking the question, “What’s bigger, a nickel or the sun”, I have yet to ever find one that can answer that question. Why do you think that is? What’s going on there? Why is that a hard question?
Because it’s what people call common sense. And really it refers to a general understanding of how the world around us works, at least from the point of view of humans. All of us have this kind of common sense knowledge. It’s not things that you find in books, typically. At least not explicitly. I mean, you might find some of it implicitly, but you don’t get like in Wikipedia, you don’t get that knowledge typically. That’s knowledge we pick up as children and that’s the kind of knowledge also that sometimes is intuitive. Meaning that we know how to use it and we can recognize a thing, like we can recognize a chair, but we can’t really describe formally—in other words with a few equations—what a chair is. We think we can, but when we’re actually pressed to do it, we’re not able to do a good job at that. And the same thing is true for example in the case of the AlphaGo system that beat the world champion at the game of Go, it can use a form of intuition to look at the game state and decide, “What would be a good move next?” And humans can do the same thing without necessarily being able to decompose that into a very clear and crisp explanation. Like, the way it was done for chess. In the case of chess, we have a program that actually tries many moves. Like, “If I do this, then what’s the worst thing for me that could happen? The other guy does that, and then I could do this, and then the other guy does that, and then I could do this.” So, this is a very crisp, logical explanation of why the computer does this.
In the case of neural nets, well, we have a huge network of these artificial neurons with millions or hundreds of millions of parameters, of numbers that are combined. And, of course, you can write an equation for this, but the equation will have so many numbers in it that it’s not humanly possible to really have a total explanation of why he’s doing this. And the same thing with us. If you ask a human, “Why did you take that decision,” they might come up with a story, but for the most part, there are many aspects of it that they can’t really explain. And that is why the whole classical AI program based on expert systems where humans would download their knowledge into the computer by writing down what they know, it failed. It failed because a lot of the things we know are intuitive. So, the only solution we found is that computers are going to learn that intuitive knowledge by observing how humans do it, or by observing the world, and that’s what machine learning is about.
So, when you take, you know, a problem like “a nickel or the sun, which is bigger?” or those kind of common sense problems, do we know how to solve them all? Do we know how to make a computer with common sense and we just haven’t done it? We don’t have all the algorithms done, we don’t have the processing power and all of that? Or do we kind of not know how to get that bit of juice, or magic into it?
No, we don’t know. We know how to solve simpler problems that are related, and different researchers may have different plans for getting there, but it’s still an open problem and it’s still research about how do we put things like common sense into computers. One of the important ingredients that many researchers in my area believe is that we need better unsupervised learning. So, unsupervised learning is when the computer learns without being told what it should be doing. So, when the computer learns by observation or by interacting with the world, but it’s not like supervised learning, where we tell the computer, “For this case you should do this.” You know, “The human player in this position played that move. And this other position, the human player played that move.” And you just learn to imitate. Or, you have a human driving a car, and the computer just learns to do the same kinds of moves as the driver would do in those same circumstances. This is called supervised learning. Another example to discriminate between supervised and unsupervised is, let’s say you’re learning in school and your professor gives you exercises and at the end of each exercise, your professor tells you what the right answer was. So now, you can, you know, train yourself through many, many exercises and this is supervised learning. And, it’s hard, but we’re pretty good at it right now. Unsupervised learning would be you go and you read books and you try things for yourself in the world and from that you figure out how to answer questions. That’s unsupervised learning and humans are very good at it. An example of this is what’s called intuitive physics. So, if you look at a two-year-old child, she knows physics. Of course, she doesn’t know Newtonian physics. She doesn’t know the equations of physics. But, she knows that when she drops a ball, it falls down. She knows, you know, the notions of solid objects and liquid objects and all this, pressure, and all this and she’s never been told about it. Her parents don’t keep telling her, “Oh, you know, you should use this differential equation and blah blah blah.” No, they tell her about other things. She learns all of this completely autonomously.
Do you use that example as an analogy or do you think there are things we can learn from how children acquire knowledge that are going to be really beneficial to building systems that can do unsupervised learning?
Both. So, it is clearly an analogy and we can take it as such. We can generalize through other scenarios. But, it’s also true that, at least some of us in the scientific community for AI, are looking at how humans are doing things, are looking at how children are learning, are looking at how our brain works. In other words, getting inspiration from the form of intelligence that we know that exists. We don’t completely understand it, but we know it’s there and we can observe it and we can study it. And scientists in biology and psychology have been studying it for decades, of course.
And similarly, you know, just thinking out loud, we also have neural nets which, again, appeal to the brain.
Have we learned everything we think we’re going to learn from the brain that’s going to help us build intelligent computers? Or are they really just like absolutely nothing in common. They’re like very different systems.
Well, the whole area of deep learning, which is so successful these days, is just the modern form of neural nets. Which, neural nets have been around for decades. Since the ’50s. And they are, of course, inspired by things we knew about the brain. And now we know more and, actually, some elements of brain computation have been imported in neural nets fairly recently. Like in 2011, we introduced our rectification units in deep neural nets and showed that they help to train deeper networks better. And actually, the inspiration for this was the form of the computation of the nonlinearities that are present in actual brain neurons. So, we continue to look at the brain as potential sources of inspiration. Now, that being said, we don’t understand the brain. Biologists, neuroscientists know a lot of things, have made tons of observations, but they still don’t have anywhere near the big picture of how the brain works, and most importantly for me, how the brain learns. Because this is the part that really we need to import in our machine learning systems.
And so, looking out a ways, we talked about common sense. Do you think we’re eventually going to build an AGI, an artificial general intelligence, that is as versatile or more so than a human?
You know, when you talk to people and you say, “When will we have that?” the range that I’ve heard is five to five hundred years. First of all, that’s two orders of magnitude difference. Why do you think there’s such disagreement on when we’ll have it, and if you were then to throw your name in the hat and put a prediction out there, when would you think we would get an AGI?
Well, I don’t have a crystal ball and really you shouldn’t be asking those questions.
Well, there’s so much uncertainty.
So, there is a nice image here to illustrate why it’s impossible to answer this question. It’s like if you are climbing up a mountain range and right now you’re on this mountain and it looks like it’s the biggest mountain around and if you really want to get to the top, you have to reach the top of that mountain. But, really, you don’t see behind that mountain what is looming, and it’s very likely that after you reach the top of that mountain, we might see another one that’s even higher. And so we…you know, we have more work to do. So, right now it looks like we’ve made a lot of progress on this particular mountain which allows us to do so well in terms of perception, at least at some level. But, higher level, we’re still making baby steps, really. And we don’t know if the tools that we currently have, the concepts that we currently have with some incremental work will get us there. Or…and that might then happen in maybe ten years, right? With enough computing power. Or, if we’re going to face some other obstacle that we can’t foresee right now which we could be stuck with for a hundred years. So, that’s why giving numbers I think is not informing us very much.
Fair enough. And if you’ll indulge me with one more highly speculative question, I’ll return to the here and now and practical uses. So, my other highly speculative question is, “Do you think we’re going to build conscious machines ever?” Like, can we do that? And in your mind, is consciousness—the ability to perceive and self-awareness and all that comes with consciousness—is that required to build a general intelligence? Or is that a completely unrelated thing?
Well, it depends on the kind of intelligence that we want to build. So, I think you could easily have some form of intelligence without consciousness. As I mentioned, imagine a really, really smart encyclopedia that can answer any of your questions but doesn’t have any sense of self. But if we want to build things like robots, we’ll probably need to give them some sense of self. Like, a robot needs to know where it is, and how it stands compared to other objects or agents. It needs to know things like if it gets damaged, so it needs to know how to avoid being damaged. It’s going to have some kind of primitive emotions in the sense that, you know, it’s going to have some way of knowing that it’s reaching its objectives or not, and, you know, you can think of this as being happy or whatever. So, the ingredients of consciousness are already there in systems that we are building. But, they’re just very, very primitive forms. So, consciousness is not like a black and white thing. And it’s not even something that people agree on, even less than what intelligence is I think, what consciousness is. But my understanding of it is that we’ll build machines that have more and more of a form of consciousness as needed for the application that we’re building them for.
Our audiences are largely business people and, you know, they constantly read about new breakthroughs every day in the artificial intelligence space. How do you advise people to discover, to spot problems that artificial intelligence would be really good at chewing up and, you know, really getting your hands around? Like, if I were to walk around my company and I go from department to department to department—I go to HR, then I go to marketing, then I go to product, then I go to all of them, everyone—how do you spot things where AI might be a really good thing to deploy to solve?
Okay. So, it depends on your time horizon. So, if you want to use today’s science and technology and apply it to current tasks, it’s different from saying, “Oh, I imagine this particular service or product which could come out in five years from now.”
The former. What can you do today?
Yes. Okay. So, today, then things are pretty clear. You need a very well-defined task for which you have a lot of examples of what the right behavior of the computer should be. And a lot could be millions. It depends on the complexity of the task. If you’re trying to learn something very simple, then maybe you need less. And if you need to learn something more complicated… For example, when we do machine translation, you would easily have hundreds of millions of examples. You need to have a well-defined task in the sense that the situation where the computer is going to be used, we know what information it would have. This is the input. And we know what kind of decision it’s supposed to make. This is the output. And, we’re doing supervised learning, which is the thing we’re doing really well now. So, in other words, we can give the computer examples of, “Well, for these inputs, this is the output that it should’ve produced.” So, that’s one element. Another element is, well, not all of the tasks like this are easy to learn by current AI with deep learning. Some things are easier to learn. In particular, things that humans are good at are more likely to be easier to learn. Things that are fairly limited in scope. Because in a sense that makes them easier. That’s also going to tend to be easier to learn. Things that if you were able to solve this problem, then you must have really good common sense and you must be basically intelligent, well, you’re probably not going to be able to do that well because we haven’t solved AI yet. So, these are some things you can look for.
You’ve mentioned games earlier and I guess there’s a long history of using artificial intelligence to play games. I mean, it goes back to Claude Shannon writing about chess. IBM, in the ’50s, had a computer to play checkers. Everybody knows the litany, right? You had Deep Blue and you had chess and then you had Jeopardy and then you had AlphaGo and then you had poker recently. And I guess games work well because there’s a very defined rule set and it’s a very self-contained universe. What would be… You know, everybody talks about Go as this one that would be very hard to do. What is the next game that you think that you’re going to see computers have a breakthrough in and people are going to be scratching their heads and marveling at that one?
So, there are some more complex video games that researchers are looking at. Which involve virtual worlds that are much more complex than the kind of simple grid world in which you live in the case of Go. And also, there’s something about Go and chess which is not realistic for many situations. In Go and chess, you can see the whole state of the world, right? It’s the positions of all the pieces. In a more realistic game or in more real world, of course, the agent doesn’t know everything that there is to know about the state of the world. It only sees parts of it. There is also the question of the kind of data we can get. So, one problem with games, but also with other applications like dialogue is that it really isn’t the case that you can give me a set of data that I can extract that data once and for all, from, maybe, asking a lot of humans to perform particular tasks and then just learn by imitation. The reason this doesn’t work is because when the learning machine is going to be playing using its own strategy, it may do things differently than how humans have been doing it. So, if we talk about dialogue, maybe our dialogue machine is going to make mistakes that no human would ever do, and so the dialogue would then move into a sort of configuration that has never been seen when you look at people talking to each other. And now the computer doesn’t know what to do, because it’s not part of what it’s been trained with. So, what it means is that the computer needs to be interacting with the environment. That’s why games that are simulated are so interesting for researchers. Because we have a simulator in which the computer can sort of practice what the effect of its actions would be, and depending on what it does, you know, what is it going to observe and so on.
So, there’s a fair amount of worry and consternation around artificial intelligence. Specifically, with regard to jobs and automation.
Since that’s closer to the near future, what do you think is going to happen?
It’s very probable that there’s going to be a difficult transition in the job market. According to some studies, something like half of the jobs will be impacted seriously. That means a lot of jobs will go. Everybody doing that job may not necessarily go, but their job description might change because a lot of what they were doing, which was sort of routine, will be done by computers, and then we’ll need less people for doing the same work. At the same time, I think, eventually there will be new jobs created, and there should not be, really, unemployment because we still want to have humans doing some things that we don’t want computers to do. I don’t want my babies to be taken care of by a machine. I want a human to interact with my babies. I want a human to interact with my old parents. So, all the caring jobs, all the teaching jobs, I mean, to some extent even though computers will have an important role, I think we would be happy to have, instead of classes of thirty students, classes of five students, or classes of two students. I mean, there’s no limit to how much humans can help each other. And right now, we can’t because we don’t have, you know, it would be too costly. But once the jobs that can be automated are automated, well, those human to human jobs I think will become the norm. And also all the jobs that are more creative, require less routine and things like of course artists or even scientists, hopefully, we’ll probably want to have more of these people.
Now, that being said, there’s going to be a transition, I think, where there’s going to be a lot of people losing their jobs and they’re not going to be having the right training to do something else that, you know, other jobs that are going to be opening up. And so, we have to set up the right social security to take care of these people, maybe with guaranteed minimum income or something else, but somehow, we have to think about that transition because it could have big political impact. If you think about the transition that happened with the industrial revolution, from agriculture to industry and all the human misery that happened say between the middle of the nineteenth century to the middle of the twentieth century…well, a lot of that could have been avoided if we had put in place the kind of social measures that we did finally put in place around the Second World War. So, similarly, we need to think a little bit about what would be the right ways to handle the transition to minimize human suffering? And there’s going to be enough wealth to do it, because AI is going to create a lot of wealth. A lot of new product services doing it more efficiently, so thus, you know, in a sense, globally we’re all going to get richer. But the question is, where is that money going to go? We have to make sure that some of it goes to help that transition from the human point of view.
You kind of get people to fall into three camps on this question. One says, “You know, there will be a point where a computer can learn a new task faster than a human. And when that happens, that’s this kind of tipping point where they will do everything. They’ll do every single thing a human can do.” So, you’re essentially a school of thought that says you’re going to lose basically all of the jobs. All of them could be done by a machine. Then you get people who say, “Well, there’s going to be some amount of displacement,” and they often appeal to things like The Great Depression. To say, “There are certain people that are going to lose their jobs and then they’re not going to have training to find new ones in this new economy.” And then finally you come to people who say, “Look. This is an old, tired song. Unemployment has been between four and nine percent in the West for three hundred years, two hundred and fifty years. You can mechanize industry, you can eliminate farming, you can bring electricity in, you can go to coal power, you can create steam — you can do these amazingly disruptive things and you never even see a twitch in unemployment numbers. None. Nothing. Four to nine percent. So, that is certainly kind of the historical fact, and that view says you’re not going to have any more unemployment than you do now. New jobs will be made as quickly as the old ones are eliminated. So, anybody who holds one of the other two positions from that one, it’s incumbent on them to say why they think this time is different. And they always have a reason they think this time is different. And it sounds like you think we’re going to have a lot of job turnover, it’s going to be disruptive enough that we may need a basic income. There’s going to be this big mismatch between people’s skills and what the economy needs. So, it sounds like a pretty tumultuous time you’re seeing.
And so, what do you think… If they say, “Well, what’s different this time? Why is it going to be different than say, bringing electricity to industry, or bringing mechanization, replacing animal power with machine power?” I mean, that has to be of the same kind of order as what we’re talking about. Or does it?
So, we’re talking about a different kind of automation. And we’re talking about a different speed at which it’s going to happen. So, the traditional automation is replacing human, physical power and potentially skill, but in very routine ways. The new automation that’s starting is able to deal with a lot of kinds of tasks. And when we were doing the transition from, you know, due to the automation, for example, say in the auto industry, so of these fairly… Or even the agricultural industry… The automation of those rather labor, physical, intensive tasks to the current situation where many of these are automated, people could migrate to the white collar jobs and the service industry. So, now, it’s less clear where the migration will be. I think there will be a migration, as I said, to jobs that involve more human interaction and more creativity than what machines will be able to do for a while. But the other factor is the speed. I think it’s going to happen much faster than it has happened in the past. And so that means people won’t have time to go to the end of their retirement, and their job is not replaced. They’re going to lose their jobs in their 30s or 40s or 50s, and, of course, that could create a lot of havoc.
The number one question that I am asked when I speak on this topic, far and away the number one question, is, “What should my children study today so that they will be employable in fifty years?” It sounds like your answer to that is, “Things that require some kind of an emotional attachment and things that require some amount of creativity.” Are there other categories of jobs you would throw into that bucket or not?
Well, obviously, those computers have to be built by some people, so we need scientists, programmers and so on. That’s going to continue for a while. But that’s a small portion of the population. I think, for those who can, scientific jobs and engineering and computer-related jobs, we’re going to continue to need more and more of these. That’s not going to stop anytime soon. And, as you said, I think the human-to-human jobs, we’re going to need more. We’re going to want more. So, basically, what’s going to happen is, we’re going to have all this extra, I mean, some people are going to have extra wealth coming from this. Maybe, you know, you work for a company. You work for Google and you have this big salary, and now you can use this money to send your kids to a school that has classes of size five instead of thirty.
You know, coupled with artificial intelligence, always what gets grouped in with that is the discussion of robots. So that, you kind of have both the mind and the body which technology is kind of replacing. Robots seem to move at a much slower rate. I mean, if they had a Moore’s Law, they’re doubling every, you know, it’s more than two years. Do you have any thoughts on the marriage of robots with artificial intelligence and are robots needed for… Do AIs need to be embodied to learn better and things like that? Or are those just apples and oranges. They have nothing to do with each other?
Oh, they have things to do with each other. So, I do believe that you could have intelligence without a body. But, I also believe that having a body, or some equivalent of a body, as I’ll explain later, might be an important ingredient to reach that intelligence. So, I think a lot of things that we learn by interacting with the world. You don’t see me right now, but I’m picking up a glass and I’m looking at it from different angles, and if I had never seen this glass, this manipulation could teach me a lot about it. So, I think the idea of robots interacting with the environment is important. Now, robots themselves with legs and arms, I expect that the progress is going to be slower than with virtual intelligence. Because, you know, robots…the research cycle is slower; you build them, you program them, you try them for real. More importantly, it takes time for the robot to interact, and one robot can only learn so much. But if you have a bot, in other words, an intelligence that goes on the web and maybe interacts with people, well, it can interact with millions or even billions of people. Because it can have many copies of itself running. And so, it can have interactions, but they’re not physical interactions. They’re virtual interactions and it can learn from a lot of data, because there’s a lot of data out there. And, you know, everything on the web. So, there is an opportunity, I would bet that we’re going to see progress in AI go faster with those virtual robots than with the real, physical robots. But eventually, I think we’ll get those as well, and it’s just going to be at a different pace.
One of the areas that you mentioned that you’re deeply interested in is natural language processing and, you know, to this day, whenever I call my airline of choice and I have to say my membership number… It’s got an 8 in it, and if I’m on my headset, it never gets whether it’s an 8 an H or an A. So, I always have to unplug my headset and say it into the phone and all of that. And yet, I interface with other systems, like Amazon Alexa or Google Assistant and all of that, that seem to understand entire paragraphs and sentences and can capitalize the right things and so forth. What am I experiencing there, those two very different experiences? Is it because in the first case there’s no context, and so it really doesn’t know how to guess between 8 and H and A?
So, right now, the systems that are in place are not very smart. And some systems are not smart at all. Machine learning methods are only starting to be used in those deployed systems and they still only are used for parts of the system. That’s also true, by the way, of self-driving cars right now. That the system is designed more or less by hand, but some parts, like say recognizing pedestrians, or in the case of language, maybe parsing or identifying who you’re talking about just by the name; these jobs are done by separately trained modules that are trained by supervised learning. So that’s the typical scenario right now. The current state of the art with deep learning in terms of language understanding allows those systems to get a pretty good sense of what you’re talking about in terms of the topics and even what you’re referring to. But, they’re still not very good at making what we consider something like rational inferences and reasoning on top of those things. So, something like machine translation, actually, has made a huge amount of progress, in part, due to the things we’ve done in my lab, where you can get the computer to understand pretty well what the sentence is about and then use, you know, the specifics of the words that are being used to define a good translation. But, you could still fail in cases where there are complicated semantic ambiguities. But those don’t come up very often when you do translation. However, they would come up in tasks like, say the kinds of exams that students pass where they read a text and then they have to answer questions about it. So, there are still things that we’re not very good at, which involve high level understanding and analogies.
You mentioned that you were bullish on jobs that required human creativity. And I’ve always been kind of surprised by the number of researchers in artificial intelligence who kind of shrug creativity off. They don’t think there’s anything particularly special or interesting about it and think that computers will be creative sooner than they’ll be able to do other things that seem more mundane. What are your thoughts on human creativity?
So actually, I’ve been working on creativity, and we call it a different name. In my field, we call it generative models. So, we have neural nets that can generate images. That’s the thing we’re doing the most. But now we are also doing generation of sounds, of speech, and potentially we could synthesize any kind of object if we see enough examples of these types of objects. So, the computer could look at examples of natural images, and then create new images of some category that look fairly realistic right now. Still, obviously, you can recognize that they’re not the real thing, but you obviously see what the object is. So, we’ve made a lot of progress in the ability of the computer to dream up synthetic images or sounds or sentences. So, there’s a sense in which the computer is creative, and it can invent new poems, if you want, or new music. The only thing is, what it invents isn’t that great from a human point of view. In the sense that it’s not very original, it’s not very surprising. It still doesn’t really fit as well as what a human would be able to do. So, although computers can be creative and we have a lot of research in allowing computers to generate all kinds of new things that look reasonable, we are very far from the level of competence that humans have in doing this. So, why we are not there is linked to the central question that I mentioned in the beginning, which is, computers right now don’t have common sense. They don’t have a sufficiently broad understanding of how the world works. And without that common sense, without this causal understanding of what’s the relationships between high level explanations, and causes and effects, that’s still missing. And until we get there, the creativity of humans is going to be way, way over that of machines.
I reread Moby Dick a couple of months ago and I remember stopping on this one passage. And I’m going to misquote it, so, I apologize to all of the literary people out there that I’m going to mess this up. But it went something like, “And he piled forth on the whale’s white hump, the sum of all his rage and fury. If his chest had been a canon, he would’ve fired his heart upon it.” And I read that, and I put the book down and I thought, “How would a computer do that?” There’s so much going in there. There’s these rich and beautiful metaphors. “If his chest had been a canon he would’ve fired his heart upon it.” And why that one? And it does ask this question: Is creativity merely computational? Is it something that’s really is reducible down to, “Show me enough examples and I’ll start analogizing and coming up with other examples.” Do you think we’ll have our Herman Melville AI that will just write stuff like that before breakfast?
I do really believe that creativity is computational. It is something we can do on a small scale already, as I said earlier. It is something we understand the principles behind. So, it’s only a matter of having…only, right, but neural nets or models that are smarter, that understand the world better. So, I don’t think that creativity is something… I don’t think that any of the human faculties is something inherently inaccessible to computers. I would say that some aspects of humanity are less accessible and creativity of the kind that we appreciate is probably one that is going to be something that’s going to take more time to reach. But maybe even more difficult for computers, but also quite important, will be to understand not just human emotions, but also something a little bit more abstract, which is our sense of what’s right and what’s wrong. And this is actually an important question because when we’re going to put these computers in the world, in products, and they’re going to take decisions, well for some very simple things we know how to define the task, but sometimes the computer is going to be having to make a compromise between doing the task that it wants to do and maybe doing bad things in the world. And so, it needs to know what is bad. What is morally wrong? What is socially acceptable? And, I think we’ll manage to train computers to understand that, but it’s going to take a while as well.
You’ve mentioned machine translation a couple of times. And anybody who follows the literature is aware that, as you said earlier, that our ability to do machine translation has had some real breakthroughs. And you even said some of that, you and your team had a hand in. Can you describe in more laymen terms what exactly changed? What was the “a’ha” moment or what was the data set, or what was different that gave us this big boost?
So, I would mention two things. One actually dates back to work I did around 2000. So, this is a long time ago, something we call “word embeddings” or “word representations.” Where we trained the computer to associate with each word a pattern of activation. Think of it like the pattern of activation of neurons in your brain. So, it’s a bunch of numbers. And the thing that’s interesting about this is, you can think of it as like a semantic space. So now, whereas cat and dog are just two words and any two words are, you know, just symbols and there’s nothing in, say the spelling of the word “cat” and “dog” that tells us that cats and dogs have something in common. But, if you look at how your neurons fire when you see the picture of a cat or when you see the picture of a dog, or if you look at our neural nets and how the artificial neurons in these networks fire in response to a picture of a cat or a picture of a dog or a text which talks about cats or talks about dogs, well, actually those patterns are very similar. Because cats and dogs have many things in common. They’re pets and we have a particular relationship with them and so on. And so, we can use that to help the computer to generalize correctly to new cases. Even to new words that it has never seen, because maybe we’ve never seen the translation of that word, but we’ve seen that it’s associated with other words in the same language.
We can make associations that allow the computer to map symbols from sentences into these semantic spaces in which sentences that mean more or less the same thing will be represented more or less the same way. And so now you can go from, say, a sentence in French to that kind of semantic space and then from that semantic space you can decode into a sentence in English. So, that’s one aspect of the success of machine translation with deep learning. The other aspect is that we had really a big breakthrough when we understood that we could use something called an attention mechanism to help machine translation. So, the idea is actually fairly simple to explain. Imagine you want to translate a whole book from French to English. Well, before we introduced this attention mechanism, the strategy would have been the computer reads the book in French. It builds this semantic representation, like, you know all these activations of neurons. And then it uses this to write up the book in English. But this is very hard. Imagine having to hold the whole book in your head; it’s very hard. Instead, much better way is, I’m going to translate sort of one sentence at the time. Or even like keeping track in each book, in the French book and the English book that I’m producing, where I’m currently standing, right? So, I know that I’ve translated up to here in French and I’m looking at the words in the neighborhood to find out what the next word should be in English. So, we use attention mechanism, which allows the computer to pay attention more specifically to parts of the input—here, you would say, parts of the book that you want to translate. Or, for images, part of the image that you want to say something about. So, this is actually, of course, inspired by things we know from humans that use attention mechanism. Not just as an external device—you know, I look at something in front of me and I pay attention to a particular part of it—but also internally. Like, we can use our own attention to look back on different aspects of what we’ve seen or heard and of course that’s very, very useful for us for all kinds of cognitive tasks.
And in your mind, is it like, “You ain’t seen nothing yet. Just wait”? Or is it like, “This is eighty percent of what we know kind of today how to do, so don’t expect any big breakthroughs anytime soon”?
Well, this is like your number of years question.
I see, okay.
But I would say that it’s very likely that the pace of progress in AI is going to accelerate. And there’s a very simple mathematical reason. The number of researchers doing it is increasing exponentially right now. And, you know, science works by small steps, in spite of what sometimes may be said. The effect of scientific advances could be very drastic, because we pass a threshold that suddenly we can solve this task and we can build this product. But science itself is just an accumulation of ideas and concepts that’s very gradual. And the more people do it, the faster we can make progress. The more money we put in, the better facilities, and computing equipment, the faster we can make progress. So, because there’s a huge investment in AI right now, both in academia and in industry, the rate at which advances are coming and papers are being published is just increasing incredibly fast. So, that doesn’t mean that we might not face some brick wall and get stuck for a number of years trying to solve a problem that’s really hard. We don’t know. But my guess is that it’s going to continue to accelerate for a while.
Two more questions. First, you’re based in Canada and have done much work there. I see more and more AI things, writing and publications and so forth coming out of Canada. I saw that the Canadian government is funding some AI initiatives. Is that a fact that there seems to be a disproportional amount of AI investment in Canada?
It is a fact, and, actually, Canada has been the leader in deep learning since the beginning of deep learning. So, two of the main labs working on this are the ones in my group here in Montreal. And in Toronto, with Geoff Hinton. Another important place was in New York with Yann LeCun, and eventually Stanford and eventually other groups. But, so, a lot of the initial breakthroughs in deep learning happened here and we continue growing. So, for example, in Montreal we have, in terms of academic research, we have the largest group doing deep learning in the world. So, there’s a lot of papers and advances coming from Canada in AI. We also have, in Edmonton, Rich Sutton, which is one of the godfathers of reinforcement learning, which we didn’t talk about, which is when the machine learns by doing actions and getting feedback. So, there’s a sort of scientific advance, scientific expertise that up to now has been very strong for Canada and has been exported. Because our scientists have been bought, by US companies mostly. But, the Canadian government has understood that if they want some of the wealth coming out of AI to benefit Canada, then we need to have a Canadian AI industry. And so, there’s a lot of investment right now in the private sector. Government’s also investing in research centers. So, they’re going to create these institutes in Montreal, Toronto, and Edmonton. And, you know, companies are flocking to Montreal. Experts from around the world are coming here to do research, to build companies. So, there’s an amazing momentum going on.
And then finally, what are you working on right now? What are you excited about? What do you wake up in the morning just eager to get to work on?
Ah, I like that question. I’m trying to design learning procedures which would allow the machine to make sense of the world, and the way I think this can be done is if the machine can represent what it sees—so, it sees images, text and things like that—in a different form, which I call “disentangled.” So, in other words, trying to separate the different aspects of the world, the different causes of what we’re observing. That’s hard. We know that’s hard, but it has actually been the objective that we set for ourselves more than ten years ago when we started working on deep learning and I wrote this chapter in a book with Yann LeCun about the importance of extracting good representations that can separate out those factors. And the new thing is incorporating reinforcement learning, where the learning system, the learning agent interacts with the world so as to better understand the cause and effect relationships and so separate out the different causes from each other and sort of make sense of the world in this way. It’s a little bit abstract what I’m saying, but let’s say that it’s fundamental research. We could take decades to reach maturity. But I believe it is very important.
Well, thank you very much. I appreciate you taking the time to chat with us and good luck on your work.
My pleasure. Bye.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here

Researchers build pattern-recognition model that acts like a human

A trio of MIT researchers has developed a machine learning model that might help humans make better sense of big data by helping us make better sense of the patterns it discovers. Its creators call it the Bayesian Case Model, but a simpler description might be the example-creator.

The thinking behind the research is that humans tend to think about things and make decisions based on previous experiences or examples we’ve seen. Children, for example, might overhear just a few words of their parents’ conversation and know they’re talking about summer camp because they went last year and they know that words like “month,” “lake” and “counselors” are primarily used together only in that context.

If, however, we have limited or no experience in a particular field, a little help might be necessary — which is where the Bayesian Case Model comes into play. Given a set of data such as a recipes (which is one type the researchers used in their research), the model will categorize them based on their most-prominent ingredients, as well as their similarity to a representative example, or prototype, for any given cluster of recipes, which is also chosen by the computer.


For example, even if I didn’t know that beer, chili powder and tomato were common ingredients in chili, I might be able to deduce that a recipe containing them is chili after seeing what the model has deemed the prototypical chili recipe. Indeed, the MIT researchers (Been Kim (right, above), Cynthia Rudin and Julie Shah (left, above)) found that not only did their model perform more accurately than previous approaches, but human testers were able to correctly categorize recipes at a significantly higher rate using output from the Bayesian Case Model than output from earlier approaches.

The approach should work with more difficult types of data in more specialized fields, as well.

This type of work, even if not this model itself, could become more useful as datasets continue outgrowing people’s abilities to analyze them. Unsupervised machine learning or artificial intelligence models, for example — from software like Ayasdi to Google’s famous cat-recognizing deep learning system — can already churn through lots of data and identify similar things. But any tools are only as useful as they are accurate, and as easy they make it for humans to decipher what they’ve found.

The full MIT paper is available here.