Voices in AI – Episode 1: A Conversation with Yoshua Bengio

In this episode Byron and Yoshua talk about knowledge, unsupervised learning, how the brain learns, creativity, and machine translation.
[podcast_player name=”Episode 1: A Conversation with Yoshua Bengio” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2017-09-28-(00-56-04)-yoshua-bengio.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2017/09/voices-headshot-card.jpg”]


Yoshua Bengio received a PhD in Computer Science from McGill University in Canada in 1991. After two post-doctoral years, one at MITand one at AT&T Bell Labs, he became professor at the Department of Computer Science and Operations Research at the University of Montreal. He is the author of two books and more than 200 publications. The most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been Associate Editor of the top journals in machine learning and neural networks.


Byron Reese: This is Voice in AI, brought to you by Gigaom. I’m Byron Reese. Today our guest is Yoshua Bengio. Yoshua Bengio received a PhD in Computer Science from McGill University in Canada in 1991. After two post-doctoral years, one at MIT and one at AT&T Bell Labs, he became professor at the Department of Computer Science and Operations Research at the University of Montreal. He is the author of two books and more than two hundred publications. The most cited being in the areas of deep learning, recurrent neural networks, probabilistic learning algorithms, natural language processing and manifold learning. He is among the most cited Canadian computer scientists and is or has been Associate Editor of the top journals in machine learning and neural networks. Welcome to the show, Yoshua.
Yoshua Bengio: Thank you.
So, let’s begin. When people ask you, “What is artificial intelligence,” how do you answer that?
Artificial intelligence is looking for building machines that are intelligent, that can do things that humans can do, and for doing that it needs to have knowledge about the world and then be able to use that knowledge to do useful things.
And it’s kind of kicking the can down the street just a little bit, because there’s unfortunately no consensus definition of what intelligence is either, but it sounds like the way you describe it, it’s just kind of like doing complicated things. So, it doesn’t have an aspect of, you know, it has to respond to its environment or anything like that?
Not necessarily. You could imagine having a very, very intelligent search engine that understands all kinds of things but doesn’t really have a body, doesn’t really live in an environment other than the interactions with people through the queries. So, the kinds of intelligence, of course, that we know and we think about when we think about animals are involving movement and actions and so on. But, yeah. Intelligence could be of different forms and it could be about different aspects. So, a mouse is very intelligent in its environment. If you or I went into the head of the mouse and tried to control the muscles of the mouse and survive, we probably wouldn’t last very long. And if you understand my definition, which is about knowledge, you could know a lot of things in some areas so you could be very intelligent in some area and know very little about another area and so not be very intelligent in other areas.
And how would you describe the state of the art? Where are we with artificial intelligence?
Well, we’ve made huge progress in the ability of computers to perceive better, to understand images, sounds and even to some extent language. But, we’re still very far from machines that can discover autonomously how the world around us works. We’re still very far from machines who can understand sort of high-level concepts that we typically manipulate with language. So, there’s a lot to do.
And, yeah, it’s true. Like, if you go to any of the bots that have been running for years, the ones that people built to maybe try to pass the Turing test or something, if you just start off by asking the question, “What’s bigger, a nickel or the sun”, I have yet to ever find one that can answer that question. Why do you think that is? What’s going on there? Why is that a hard question?
Because it’s what people call common sense. And really it refers to a general understanding of how the world around us works, at least from the point of view of humans. All of us have this kind of common sense knowledge. It’s not things that you find in books, typically. At least not explicitly. I mean, you might find some of it implicitly, but you don’t get like in Wikipedia, you don’t get that knowledge typically. That’s knowledge we pick up as children and that’s the kind of knowledge also that sometimes is intuitive. Meaning that we know how to use it and we can recognize a thing, like we can recognize a chair, but we can’t really describe formally—in other words with a few equations—what a chair is. We think we can, but when we’re actually pressed to do it, we’re not able to do a good job at that. And the same thing is true for example in the case of the AlphaGo system that beat the world champion at the game of Go, it can use a form of intuition to look at the game state and decide, “What would be a good move next?” And humans can do the same thing without necessarily being able to decompose that into a very clear and crisp explanation. Like, the way it was done for chess. In the case of chess, we have a program that actually tries many moves. Like, “If I do this, then what’s the worst thing for me that could happen? The other guy does that, and then I could do this, and then the other guy does that, and then I could do this.” So, this is a very crisp, logical explanation of why the computer does this.
In the case of neural nets, well, we have a huge network of these artificial neurons with millions or hundreds of millions of parameters, of numbers that are combined. And, of course, you can write an equation for this, but the equation will have so many numbers in it that it’s not humanly possible to really have a total explanation of why he’s doing this. And the same thing with us. If you ask a human, “Why did you take that decision,” they might come up with a story, but for the most part, there are many aspects of it that they can’t really explain. And that is why the whole classical AI program based on expert systems where humans would download their knowledge into the computer by writing down what they know, it failed. It failed because a lot of the things we know are intuitive. So, the only solution we found is that computers are going to learn that intuitive knowledge by observing how humans do it, or by observing the world, and that’s what machine learning is about.
So, when you take, you know, a problem like “a nickel or the sun, which is bigger?” or those kind of common sense problems, do we know how to solve them all? Do we know how to make a computer with common sense and we just haven’t done it? We don’t have all the algorithms done, we don’t have the processing power and all of that? Or do we kind of not know how to get that bit of juice, or magic into it?
No, we don’t know. We know how to solve simpler problems that are related, and different researchers may have different plans for getting there, but it’s still an open problem and it’s still research about how do we put things like common sense into computers. One of the important ingredients that many researchers in my area believe is that we need better unsupervised learning. So, unsupervised learning is when the computer learns without being told what it should be doing. So, when the computer learns by observation or by interacting with the world, but it’s not like supervised learning, where we tell the computer, “For this case you should do this.” You know, “The human player in this position played that move. And this other position, the human player played that move.” And you just learn to imitate. Or, you have a human driving a car, and the computer just learns to do the same kinds of moves as the driver would do in those same circumstances. This is called supervised learning. Another example to discriminate between supervised and unsupervised is, let’s say you’re learning in school and your professor gives you exercises and at the end of each exercise, your professor tells you what the right answer was. So now, you can, you know, train yourself through many, many exercises and this is supervised learning. And, it’s hard, but we’re pretty good at it right now. Unsupervised learning would be you go and you read books and you try things for yourself in the world and from that you figure out how to answer questions. That’s unsupervised learning and humans are very good at it. An example of this is what’s called intuitive physics. So, if you look at a two-year-old child, she knows physics. Of course, she doesn’t know Newtonian physics. She doesn’t know the equations of physics. But, she knows that when she drops a ball, it falls down. She knows, you know, the notions of solid objects and liquid objects and all this, pressure, and all this and she’s never been told about it. Her parents don’t keep telling her, “Oh, you know, you should use this differential equation and blah blah blah.” No, they tell her about other things. She learns all of this completely autonomously.
Do you use that example as an analogy or do you think there are things we can learn from how children acquire knowledge that are going to be really beneficial to building systems that can do unsupervised learning?
Both. So, it is clearly an analogy and we can take it as such. We can generalize through other scenarios. But, it’s also true that, at least some of us in the scientific community for AI, are looking at how humans are doing things, are looking at how children are learning, are looking at how our brain works. In other words, getting inspiration from the form of intelligence that we know that exists. We don’t completely understand it, but we know it’s there and we can observe it and we can study it. And scientists in biology and psychology have been studying it for decades, of course.
And similarly, you know, just thinking out loud, we also have neural nets which, again, appeal to the brain.
Have we learned everything we think we’re going to learn from the brain that’s going to help us build intelligent computers? Or are they really just like absolutely nothing in common. They’re like very different systems.
Well, the whole area of deep learning, which is so successful these days, is just the modern form of neural nets. Which, neural nets have been around for decades. Since the ’50s. And they are, of course, inspired by things we knew about the brain. And now we know more and, actually, some elements of brain computation have been imported in neural nets fairly recently. Like in 2011, we introduced our rectification units in deep neural nets and showed that they help to train deeper networks better. And actually, the inspiration for this was the form of the computation of the nonlinearities that are present in actual brain neurons. So, we continue to look at the brain as potential sources of inspiration. Now, that being said, we don’t understand the brain. Biologists, neuroscientists know a lot of things, have made tons of observations, but they still don’t have anywhere near the big picture of how the brain works, and most importantly for me, how the brain learns. Because this is the part that really we need to import in our machine learning systems.
And so, looking out a ways, we talked about common sense. Do you think we’re eventually going to build an AGI, an artificial general intelligence, that is as versatile or more so than a human?
You know, when you talk to people and you say, “When will we have that?” the range that I’ve heard is five to five hundred years. First of all, that’s two orders of magnitude difference. Why do you think there’s such disagreement on when we’ll have it, and if you were then to throw your name in the hat and put a prediction out there, when would you think we would get an AGI?
Well, I don’t have a crystal ball and really you shouldn’t be asking those questions.
Well, there’s so much uncertainty.
So, there is a nice image here to illustrate why it’s impossible to answer this question. It’s like if you are climbing up a mountain range and right now you’re on this mountain and it looks like it’s the biggest mountain around and if you really want to get to the top, you have to reach the top of that mountain. But, really, you don’t see behind that mountain what is looming, and it’s very likely that after you reach the top of that mountain, we might see another one that’s even higher. And so we…you know, we have more work to do. So, right now it looks like we’ve made a lot of progress on this particular mountain which allows us to do so well in terms of perception, at least at some level. But, higher level, we’re still making baby steps, really. And we don’t know if the tools that we currently have, the concepts that we currently have with some incremental work will get us there. Or…and that might then happen in maybe ten years, right? With enough computing power. Or, if we’re going to face some other obstacle that we can’t foresee right now which we could be stuck with for a hundred years. So, that’s why giving numbers I think is not informing us very much.
Fair enough. And if you’ll indulge me with one more highly speculative question, I’ll return to the here and now and practical uses. So, my other highly speculative question is, “Do you think we’re going to build conscious machines ever?” Like, can we do that? And in your mind, is consciousness—the ability to perceive and self-awareness and all that comes with consciousness—is that required to build a general intelligence? Or is that a completely unrelated thing?
Well, it depends on the kind of intelligence that we want to build. So, I think you could easily have some form of intelligence without consciousness. As I mentioned, imagine a really, really smart encyclopedia that can answer any of your questions but doesn’t have any sense of self. But if we want to build things like robots, we’ll probably need to give them some sense of self. Like, a robot needs to know where it is, and how it stands compared to other objects or agents. It needs to know things like if it gets damaged, so it needs to know how to avoid being damaged. It’s going to have some kind of primitive emotions in the sense that, you know, it’s going to have some way of knowing that it’s reaching its objectives or not, and, you know, you can think of this as being happy or whatever. So, the ingredients of consciousness are already there in systems that we are building. But, they’re just very, very primitive forms. So, consciousness is not like a black and white thing. And it’s not even something that people agree on, even less than what intelligence is I think, what consciousness is. But my understanding of it is that we’ll build machines that have more and more of a form of consciousness as needed for the application that we’re building them for.
Our audiences are largely business people and, you know, they constantly read about new breakthroughs every day in the artificial intelligence space. How do you advise people to discover, to spot problems that artificial intelligence would be really good at chewing up and, you know, really getting your hands around? Like, if I were to walk around my company and I go from department to department to department—I go to HR, then I go to marketing, then I go to product, then I go to all of them, everyone—how do you spot things where AI might be a really good thing to deploy to solve?
Okay. So, it depends on your time horizon. So, if you want to use today’s science and technology and apply it to current tasks, it’s different from saying, “Oh, I imagine this particular service or product which could come out in five years from now.”
The former. What can you do today?
Yes. Okay. So, today, then things are pretty clear. You need a very well-defined task for which you have a lot of examples of what the right behavior of the computer should be. And a lot could be millions. It depends on the complexity of the task. If you’re trying to learn something very simple, then maybe you need less. And if you need to learn something more complicated… For example, when we do machine translation, you would easily have hundreds of millions of examples. You need to have a well-defined task in the sense that the situation where the computer is going to be used, we know what information it would have. This is the input. And we know what kind of decision it’s supposed to make. This is the output. And, we’re doing supervised learning, which is the thing we’re doing really well now. So, in other words, we can give the computer examples of, “Well, for these inputs, this is the output that it should’ve produced.” So, that’s one element. Another element is, well, not all of the tasks like this are easy to learn by current AI with deep learning. Some things are easier to learn. In particular, things that humans are good at are more likely to be easier to learn. Things that are fairly limited in scope. Because in a sense that makes them easier. That’s also going to tend to be easier to learn. Things that if you were able to solve this problem, then you must have really good common sense and you must be basically intelligent, well, you’re probably not going to be able to do that well because we haven’t solved AI yet. So, these are some things you can look for.
You’ve mentioned games earlier and I guess there’s a long history of using artificial intelligence to play games. I mean, it goes back to Claude Shannon writing about chess. IBM, in the ’50s, had a computer to play checkers. Everybody knows the litany, right? You had Deep Blue and you had chess and then you had Jeopardy and then you had AlphaGo and then you had poker recently. And I guess games work well because there’s a very defined rule set and it’s a very self-contained universe. What would be… You know, everybody talks about Go as this one that would be very hard to do. What is the next game that you think that you’re going to see computers have a breakthrough in and people are going to be scratching their heads and marveling at that one?
So, there are some more complex video games that researchers are looking at. Which involve virtual worlds that are much more complex than the kind of simple grid world in which you live in the case of Go. And also, there’s something about Go and chess which is not realistic for many situations. In Go and chess, you can see the whole state of the world, right? It’s the positions of all the pieces. In a more realistic game or in more real world, of course, the agent doesn’t know everything that there is to know about the state of the world. It only sees parts of it. There is also the question of the kind of data we can get. So, one problem with games, but also with other applications like dialogue is that it really isn’t the case that you can give me a set of data that I can extract that data once and for all, from, maybe, asking a lot of humans to perform particular tasks and then just learn by imitation. The reason this doesn’t work is because when the learning machine is going to be playing using its own strategy, it may do things differently than how humans have been doing it. So, if we talk about dialogue, maybe our dialogue machine is going to make mistakes that no human would ever do, and so the dialogue would then move into a sort of configuration that has never been seen when you look at people talking to each other. And now the computer doesn’t know what to do, because it’s not part of what it’s been trained with. So, what it means is that the computer needs to be interacting with the environment. That’s why games that are simulated are so interesting for researchers. Because we have a simulator in which the computer can sort of practice what the effect of its actions would be, and depending on what it does, you know, what is it going to observe and so on.
So, there’s a fair amount of worry and consternation around artificial intelligence. Specifically, with regard to jobs and automation.
Since that’s closer to the near future, what do you think is going to happen?
It’s very probable that there’s going to be a difficult transition in the job market. According to some studies, something like half of the jobs will be impacted seriously. That means a lot of jobs will go. Everybody doing that job may not necessarily go, but their job description might change because a lot of what they were doing, which was sort of routine, will be done by computers, and then we’ll need less people for doing the same work. At the same time, I think, eventually there will be new jobs created, and there should not be, really, unemployment because we still want to have humans doing some things that we don’t want computers to do. I don’t want my babies to be taken care of by a machine. I want a human to interact with my babies. I want a human to interact with my old parents. So, all the caring jobs, all the teaching jobs, I mean, to some extent even though computers will have an important role, I think we would be happy to have, instead of classes of thirty students, classes of five students, or classes of two students. I mean, there’s no limit to how much humans can help each other. And right now, we can’t because we don’t have, you know, it would be too costly. But once the jobs that can be automated are automated, well, those human to human jobs I think will become the norm. And also all the jobs that are more creative, require less routine and things like of course artists or even scientists, hopefully, we’ll probably want to have more of these people.
Now, that being said, there’s going to be a transition, I think, where there’s going to be a lot of people losing their jobs and they’re not going to be having the right training to do something else that, you know, other jobs that are going to be opening up. And so, we have to set up the right social security to take care of these people, maybe with guaranteed minimum income or something else, but somehow, we have to think about that transition because it could have big political impact. If you think about the transition that happened with the industrial revolution, from agriculture to industry and all the human misery that happened say between the middle of the nineteenth century to the middle of the twentieth century…well, a lot of that could have been avoided if we had put in place the kind of social measures that we did finally put in place around the Second World War. So, similarly, we need to think a little bit about what would be the right ways to handle the transition to minimize human suffering? And there’s going to be enough wealth to do it, because AI is going to create a lot of wealth. A lot of new product services doing it more efficiently, so thus, you know, in a sense, globally we’re all going to get richer. But the question is, where is that money going to go? We have to make sure that some of it goes to help that transition from the human point of view.
You kind of get people to fall into three camps on this question. One says, “You know, there will be a point where a computer can learn a new task faster than a human. And when that happens, that’s this kind of tipping point where they will do everything. They’ll do every single thing a human can do.” So, you’re essentially a school of thought that says you’re going to lose basically all of the jobs. All of them could be done by a machine. Then you get people who say, “Well, there’s going to be some amount of displacement,” and they often appeal to things like The Great Depression. To say, “There are certain people that are going to lose their jobs and then they’re not going to have training to find new ones in this new economy.” And then finally you come to people who say, “Look. This is an old, tired song. Unemployment has been between four and nine percent in the West for three hundred years, two hundred and fifty years. You can mechanize industry, you can eliminate farming, you can bring electricity in, you can go to coal power, you can create steam — you can do these amazingly disruptive things and you never even see a twitch in unemployment numbers. None. Nothing. Four to nine percent. So, that is certainly kind of the historical fact, and that view says you’re not going to have any more unemployment than you do now. New jobs will be made as quickly as the old ones are eliminated. So, anybody who holds one of the other two positions from that one, it’s incumbent on them to say why they think this time is different. And they always have a reason they think this time is different. And it sounds like you think we’re going to have a lot of job turnover, it’s going to be disruptive enough that we may need a basic income. There’s going to be this big mismatch between people’s skills and what the economy needs. So, it sounds like a pretty tumultuous time you’re seeing.
And so, what do you think… If they say, “Well, what’s different this time? Why is it going to be different than say, bringing electricity to industry, or bringing mechanization, replacing animal power with machine power?” I mean, that has to be of the same kind of order as what we’re talking about. Or does it?
So, we’re talking about a different kind of automation. And we’re talking about a different speed at which it’s going to happen. So, the traditional automation is replacing human, physical power and potentially skill, but in very routine ways. The new automation that’s starting is able to deal with a lot of kinds of tasks. And when we were doing the transition from, you know, due to the automation, for example, say in the auto industry, so of these fairly… Or even the agricultural industry… The automation of those rather labor, physical, intensive tasks to the current situation where many of these are automated, people could migrate to the white collar jobs and the service industry. So, now, it’s less clear where the migration will be. I think there will be a migration, as I said, to jobs that involve more human interaction and more creativity than what machines will be able to do for a while. But the other factor is the speed. I think it’s going to happen much faster than it has happened in the past. And so that means people won’t have time to go to the end of their retirement, and their job is not replaced. They’re going to lose their jobs in their 30s or 40s or 50s, and, of course, that could create a lot of havoc.
The number one question that I am asked when I speak on this topic, far and away the number one question, is, “What should my children study today so that they will be employable in fifty years?” It sounds like your answer to that is, “Things that require some kind of an emotional attachment and things that require some amount of creativity.” Are there other categories of jobs you would throw into that bucket or not?
Well, obviously, those computers have to be built by some people, so we need scientists, programmers and so on. That’s going to continue for a while. But that’s a small portion of the population. I think, for those who can, scientific jobs and engineering and computer-related jobs, we’re going to continue to need more and more of these. That’s not going to stop anytime soon. And, as you said, I think the human-to-human jobs, we’re going to need more. We’re going to want more. So, basically, what’s going to happen is, we’re going to have all this extra, I mean, some people are going to have extra wealth coming from this. Maybe, you know, you work for a company. You work for Google and you have this big salary, and now you can use this money to send your kids to a school that has classes of size five instead of thirty.
You know, coupled with artificial intelligence, always what gets grouped in with that is the discussion of robots. So that, you kind of have both the mind and the body which technology is kind of replacing. Robots seem to move at a much slower rate. I mean, if they had a Moore’s Law, they’re doubling every, you know, it’s more than two years. Do you have any thoughts on the marriage of robots with artificial intelligence and are robots needed for… Do AIs need to be embodied to learn better and things like that? Or are those just apples and oranges. They have nothing to do with each other?
Oh, they have things to do with each other. So, I do believe that you could have intelligence without a body. But, I also believe that having a body, or some equivalent of a body, as I’ll explain later, might be an important ingredient to reach that intelligence. So, I think a lot of things that we learn by interacting with the world. You don’t see me right now, but I’m picking up a glass and I’m looking at it from different angles, and if I had never seen this glass, this manipulation could teach me a lot about it. So, I think the idea of robots interacting with the environment is important. Now, robots themselves with legs and arms, I expect that the progress is going to be slower than with virtual intelligence. Because, you know, robots…the research cycle is slower; you build them, you program them, you try them for real. More importantly, it takes time for the robot to interact, and one robot can only learn so much. But if you have a bot, in other words, an intelligence that goes on the web and maybe interacts with people, well, it can interact with millions or even billions of people. Because it can have many copies of itself running. And so, it can have interactions, but they’re not physical interactions. They’re virtual interactions and it can learn from a lot of data, because there’s a lot of data out there. And, you know, everything on the web. So, there is an opportunity, I would bet that we’re going to see progress in AI go faster with those virtual robots than with the real, physical robots. But eventually, I think we’ll get those as well, and it’s just going to be at a different pace.
One of the areas that you mentioned that you’re deeply interested in is natural language processing and, you know, to this day, whenever I call my airline of choice and I have to say my membership number… It’s got an 8 in it, and if I’m on my headset, it never gets whether it’s an 8 an H or an A. So, I always have to unplug my headset and say it into the phone and all of that. And yet, I interface with other systems, like Amazon Alexa or Google Assistant and all of that, that seem to understand entire paragraphs and sentences and can capitalize the right things and so forth. What am I experiencing there, those two very different experiences? Is it because in the first case there’s no context, and so it really doesn’t know how to guess between 8 and H and A?
So, right now, the systems that are in place are not very smart. And some systems are not smart at all. Machine learning methods are only starting to be used in those deployed systems and they still only are used for parts of the system. That’s also true, by the way, of self-driving cars right now. That the system is designed more or less by hand, but some parts, like say recognizing pedestrians, or in the case of language, maybe parsing or identifying who you’re talking about just by the name; these jobs are done by separately trained modules that are trained by supervised learning. So that’s the typical scenario right now. The current state of the art with deep learning in terms of language understanding allows those systems to get a pretty good sense of what you’re talking about in terms of the topics and even what you’re referring to. But, they’re still not very good at making what we consider something like rational inferences and reasoning on top of those things. So, something like machine translation, actually, has made a huge amount of progress, in part, due to the things we’ve done in my lab, where you can get the computer to understand pretty well what the sentence is about and then use, you know, the specifics of the words that are being used to define a good translation. But, you could still fail in cases where there are complicated semantic ambiguities. But those don’t come up very often when you do translation. However, they would come up in tasks like, say the kinds of exams that students pass where they read a text and then they have to answer questions about it. So, there are still things that we’re not very good at, which involve high level understanding and analogies.
You mentioned that you were bullish on jobs that required human creativity. And I’ve always been kind of surprised by the number of researchers in artificial intelligence who kind of shrug creativity off. They don’t think there’s anything particularly special or interesting about it and think that computers will be creative sooner than they’ll be able to do other things that seem more mundane. What are your thoughts on human creativity?
So actually, I’ve been working on creativity, and we call it a different name. In my field, we call it generative models. So, we have neural nets that can generate images. That’s the thing we’re doing the most. But now we are also doing generation of sounds, of speech, and potentially we could synthesize any kind of object if we see enough examples of these types of objects. So, the computer could look at examples of natural images, and then create new images of some category that look fairly realistic right now. Still, obviously, you can recognize that they’re not the real thing, but you obviously see what the object is. So, we’ve made a lot of progress in the ability of the computer to dream up synthetic images or sounds or sentences. So, there’s a sense in which the computer is creative, and it can invent new poems, if you want, or new music. The only thing is, what it invents isn’t that great from a human point of view. In the sense that it’s not very original, it’s not very surprising. It still doesn’t really fit as well as what a human would be able to do. So, although computers can be creative and we have a lot of research in allowing computers to generate all kinds of new things that look reasonable, we are very far from the level of competence that humans have in doing this. So, why we are not there is linked to the central question that I mentioned in the beginning, which is, computers right now don’t have common sense. They don’t have a sufficiently broad understanding of how the world works. And without that common sense, without this causal understanding of what’s the relationships between high level explanations, and causes and effects, that’s still missing. And until we get there, the creativity of humans is going to be way, way over that of machines.
I reread Moby Dick a couple of months ago and I remember stopping on this one passage. And I’m going to misquote it, so, I apologize to all of the literary people out there that I’m going to mess this up. But it went something like, “And he piled forth on the whale’s white hump, the sum of all his rage and fury. If his chest had been a canon, he would’ve fired his heart upon it.” And I read that, and I put the book down and I thought, “How would a computer do that?” There’s so much going in there. There’s these rich and beautiful metaphors. “If his chest had been a canon he would’ve fired his heart upon it.” And why that one? And it does ask this question: Is creativity merely computational? Is it something that’s really is reducible down to, “Show me enough examples and I’ll start analogizing and coming up with other examples.” Do you think we’ll have our Herman Melville AI that will just write stuff like that before breakfast?
I do really believe that creativity is computational. It is something we can do on a small scale already, as I said earlier. It is something we understand the principles behind. So, it’s only a matter of having…only, right, but neural nets or models that are smarter, that understand the world better. So, I don’t think that creativity is something… I don’t think that any of the human faculties is something inherently inaccessible to computers. I would say that some aspects of humanity are less accessible and creativity of the kind that we appreciate is probably one that is going to be something that’s going to take more time to reach. But maybe even more difficult for computers, but also quite important, will be to understand not just human emotions, but also something a little bit more abstract, which is our sense of what’s right and what’s wrong. And this is actually an important question because when we’re going to put these computers in the world, in products, and they’re going to take decisions, well for some very simple things we know how to define the task, but sometimes the computer is going to be having to make a compromise between doing the task that it wants to do and maybe doing bad things in the world. And so, it needs to know what is bad. What is morally wrong? What is socially acceptable? And, I think we’ll manage to train computers to understand that, but it’s going to take a while as well.
You’ve mentioned machine translation a couple of times. And anybody who follows the literature is aware that, as you said earlier, that our ability to do machine translation has had some real breakthroughs. And you even said some of that, you and your team had a hand in. Can you describe in more laymen terms what exactly changed? What was the “a’ha” moment or what was the data set, or what was different that gave us this big boost?
So, I would mention two things. One actually dates back to work I did around 2000. So, this is a long time ago, something we call “word embeddings” or “word representations.” Where we trained the computer to associate with each word a pattern of activation. Think of it like the pattern of activation of neurons in your brain. So, it’s a bunch of numbers. And the thing that’s interesting about this is, you can think of it as like a semantic space. So now, whereas cat and dog are just two words and any two words are, you know, just symbols and there’s nothing in, say the spelling of the word “cat” and “dog” that tells us that cats and dogs have something in common. But, if you look at how your neurons fire when you see the picture of a cat or when you see the picture of a dog, or if you look at our neural nets and how the artificial neurons in these networks fire in response to a picture of a cat or a picture of a dog or a text which talks about cats or talks about dogs, well, actually those patterns are very similar. Because cats and dogs have many things in common. They’re pets and we have a particular relationship with them and so on. And so, we can use that to help the computer to generalize correctly to new cases. Even to new words that it has never seen, because maybe we’ve never seen the translation of that word, but we’ve seen that it’s associated with other words in the same language.
We can make associations that allow the computer to map symbols from sentences into these semantic spaces in which sentences that mean more or less the same thing will be represented more or less the same way. And so now you can go from, say, a sentence in French to that kind of semantic space and then from that semantic space you can decode into a sentence in English. So, that’s one aspect of the success of machine translation with deep learning. The other aspect is that we had really a big breakthrough when we understood that we could use something called an attention mechanism to help machine translation. So, the idea is actually fairly simple to explain. Imagine you want to translate a whole book from French to English. Well, before we introduced this attention mechanism, the strategy would have been the computer reads the book in French. It builds this semantic representation, like, you know all these activations of neurons. And then it uses this to write up the book in English. But this is very hard. Imagine having to hold the whole book in your head; it’s very hard. Instead, much better way is, I’m going to translate sort of one sentence at the time. Or even like keeping track in each book, in the French book and the English book that I’m producing, where I’m currently standing, right? So, I know that I’ve translated up to here in French and I’m looking at the words in the neighborhood to find out what the next word should be in English. So, we use attention mechanism, which allows the computer to pay attention more specifically to parts of the input—here, you would say, parts of the book that you want to translate. Or, for images, part of the image that you want to say something about. So, this is actually, of course, inspired by things we know from humans that use attention mechanism. Not just as an external device—you know, I look at something in front of me and I pay attention to a particular part of it—but also internally. Like, we can use our own attention to look back on different aspects of what we’ve seen or heard and of course that’s very, very useful for us for all kinds of cognitive tasks.
And in your mind, is it like, “You ain’t seen nothing yet. Just wait”? Or is it like, “This is eighty percent of what we know kind of today how to do, so don’t expect any big breakthroughs anytime soon”?
Well, this is like your number of years question.
I see, okay.
But I would say that it’s very likely that the pace of progress in AI is going to accelerate. And there’s a very simple mathematical reason. The number of researchers doing it is increasing exponentially right now. And, you know, science works by small steps, in spite of what sometimes may be said. The effect of scientific advances could be very drastic, because we pass a threshold that suddenly we can solve this task and we can build this product. But science itself is just an accumulation of ideas and concepts that’s very gradual. And the more people do it, the faster we can make progress. The more money we put in, the better facilities, and computing equipment, the faster we can make progress. So, because there’s a huge investment in AI right now, both in academia and in industry, the rate at which advances are coming and papers are being published is just increasing incredibly fast. So, that doesn’t mean that we might not face some brick wall and get stuck for a number of years trying to solve a problem that’s really hard. We don’t know. But my guess is that it’s going to continue to accelerate for a while.
Two more questions. First, you’re based in Canada and have done much work there. I see more and more AI things, writing and publications and so forth coming out of Canada. I saw that the Canadian government is funding some AI initiatives. Is that a fact that there seems to be a disproportional amount of AI investment in Canada?
It is a fact, and, actually, Canada has been the leader in deep learning since the beginning of deep learning. So, two of the main labs working on this are the ones in my group here in Montreal. And in Toronto, with Geoff Hinton. Another important place was in New York with Yann LeCun, and eventually Stanford and eventually other groups. But, so, a lot of the initial breakthroughs in deep learning happened here and we continue growing. So, for example, in Montreal we have, in terms of academic research, we have the largest group doing deep learning in the world. So, there’s a lot of papers and advances coming from Canada in AI. We also have, in Edmonton, Rich Sutton, which is one of the godfathers of reinforcement learning, which we didn’t talk about, which is when the machine learns by doing actions and getting feedback. So, there’s a sort of scientific advance, scientific expertise that up to now has been very strong for Canada and has been exported. Because our scientists have been bought, by US companies mostly. But, the Canadian government has understood that if they want some of the wealth coming out of AI to benefit Canada, then we need to have a Canadian AI industry. And so, there’s a lot of investment right now in the private sector. Government’s also investing in research centers. So, they’re going to create these institutes in Montreal, Toronto, and Edmonton. And, you know, companies are flocking to Montreal. Experts from around the world are coming here to do research, to build companies. So, there’s an amazing momentum going on.
And then finally, what are you working on right now? What are you excited about? What do you wake up in the morning just eager to get to work on?
Ah, I like that question. I’m trying to design learning procedures which would allow the machine to make sense of the world, and the way I think this can be done is if the machine can represent what it sees—so, it sees images, text and things like that—in a different form, which I call “disentangled.” So, in other words, trying to separate the different aspects of the world, the different causes of what we’re observing. That’s hard. We know that’s hard, but it has actually been the objective that we set for ourselves more than ten years ago when we started working on deep learning and I wrote this chapter in a book with Yann LeCun about the importance of extracting good representations that can separate out those factors. And the new thing is incorporating reinforcement learning, where the learning system, the learning agent interacts with the world so as to better understand the cause and effect relationships and so separate out the different causes from each other and sort of make sense of the world in this way. It’s a little bit abstract what I’m saying, but let’s say that it’s fundamental research. We could take decades to reach maturity. But I believe it is very important.
Well, thank you very much. I appreciate you taking the time to chat with us and good luck on your work.
My pleasure. Bye.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here

A massive database now translates news in 65 languages in real time

I have written quite a bit about GDELT (the Global Database of Events, Languages and Tone) over the past year, because I think it’s a great example of the type of ambitious project only made possible by the advent of cloud computing and big data systems. In a nutshell, it’s database of more than 250 million socioeconomic and geopolitical events and their metadata dating back to 1979, all stored (now) in Google’s cloud and available to analyze for free via Google BigQuery or custom-built applications.

On Thursday, version 2.0 of GDELT was unveiled, complete with a slew of new features — faster updates, sentiment analysis, images, a more-expansive knowledge graph and, most importantly, real-time translation across 65 different languages. That’s 98.4 percent of the non-English content GDELT monitors. Because you can’t really have a global database, or expect to get a full picture of what’s happening around the world, if you’re limited to English language sources or exceedingly long turnaround times for translated content.

For a quick recap of GDELT, you can read the story linked to above, as well as our coverage of project creator Kalev Leetaru’s analyses of the Arab Spring and Ukrainian crisis and the Ebola outbreak. For a deeper understanding of the project and its creator –who also helped measure the “Twitter heartbeat” and uploaded millions of images from the Internet Archive’s digital book collection to Flickr — check our Structure Show podcast interview with Leetaru from August (embedded below). He’ll also be presenting on GDELT and his future plans at our Structure Data conference next month.


An time-series analysis of the Arab Spring compared with similar periods since 1979.

Leetaru explains GDELT 2.0’s translation system in some detail in a blog post, but even at a high level the methods it uses to achieve near real-time speed are interesting. It works sort of like buffering does on Netflix:

“GDELT’s translation system must be able to provide at least basic translation of 100% of monitored material every 15 minutes, coping with sudden massive surges in volume without ever requiring more time than the 15 minute window. This ‘streaming’ translation is very similar to streaming compression, in which the system must dynamically modulate the quality of its output to meet time constraints: during periods with relatively little content, maximal translation accuracy can be achieved, with accuracy linearly degraded as needed to cope with increases in volume in order to ensure that translation always finishes within the 15 minute window. In this way GDELT operates more similarly to an interpreter than a translator. This has not been a focal point of current machine translation research and required a highly iterative processing pipeline that breaks the translation process into quality stages and prioritizes the highest quality material, accepting that lower-quality material may have a lower-quality translation to stay within within the available time window.”

In addition, Leetaru wrote:

“Machine translation systems . . . do not ordinarily have knowledge of the user or use case their translation is intended for and thus can only produce a single ‘best’ translation that is a reasonable approximation of the source material for general use. . . . Using the equivalent of a dynamic language model, GDELT essentially iterates over all possible translations of a given sentence, weighting them both by traditional linguistic fidelity scores and by a secondary set of scores that evaluate how well each possible translation aligns with the specific language needed by GDELT’s Event and GKG systems.”

It will be interesting to see how and if usage of GDELT picks up with the broader, and richer, scope of content it now covers. With an increasingly complex international situation that runs the gamut from the climate change to terrorism, it seems like world leaders, policy experts and even business leaders could use all the information they can get about what’s connected to what, who’s connected to whom and how this all might play out.

[soundcloud url=”https://api.soundcloud.com/tracks/165051736?secret_token=s-YTgYs” params=”color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false” width=”100%” height=”166″ iframe=”true” /]

Smartling gets $10M to expand crowdsourced translations

Smartling, a New York City-based start-up focused on enabling websites and apps to go multilingual, has raised $10 million to ramp up its localization tools. The company offers crowdsourcing tools for websites and apps to quickly and easily add additional language support.

Google Translation Center: The World’s Largest Translation Memory

Disclosure: I am the founder of Der Mundo, a multilingual blogging service and translation community that combines human and machine translation (provided in part by Google), and I have researched translation technology for more than 10 years via the Worldwide Lexicon project.

Blogoscoped reports that Google is preparing to launch Google Translation Center, a new translation tool for freelance and professional translators. This is an interesting move, and it has broad implications for the translation industry, which up until now has been fragmented and somewhat behind the times, from a technology standpoint

Google has been investing significant resources in a multi-year effort to develop its statistical machine translation technology. Statistical MT works by comparing large numbers of parallel texts that have been translated between languages and from these learns which words and phrases usually map to others — similar to the way humans acquire language. The problem with statistical MT is that it requires a large number of directly translated sentences. These are hard to find, and because of this SMT systems use sources like the proceedings from the European Parliament, United Nations, etc. Which are fine if you’re writing in bureaucrat-speak, but aren’t so great for other texts. Google Translation Center is a straightforward and very clever way to gather a large corpus of parallel texts to train its machine translation systems.

Read More about Google Translation Center: The World’s Largest Translation Memory