Why Nuance is pushing for a new type of Turing test

There’s a growing school of thought that it might be a good idea to rethink the Turing test as it has traditionally been practiced — that is, by building bots that try to converse with people and pass themselves off as human — and instead focus on something more aligned with actual machine intelligence. One popular alternative is called the Winograd Schema challenge, which replaces conversations with computers with common-sense questions to computers.

Chatbots claiming to be 13-year-old Ukrainian boys need not apply.

First proposed by a University of Toronto researcher in 2011, the Winograd Schema challenge involves giving a computer a statement that includes, for example, ambiguous pronoun usage and then asking a simple question about it. A simple example might be something like this: “The cat sat on the blanket because it was warm. What was warm?

In order to prevent clever programming that might be able use word order or vocabulary to answer the question without “thinking,” there is always a word that can be replaced to change the answer while still resulting in a sensical statement. In the example above, changing warm to cold would change the answer from blanket to cat.

A deeper explanation, from the original paper by Hector J. Levesque.

A deeper explanation, from the original paper by Hector J. Levesque.

Now, the Winograd Schema challenge has a corporate backer in natural-language expert Nuance Communications. Starting in October 2015, Nuance will sponsor an annual Winograd Schema challenge offering a $25,000 grand prize to whoever builds a system (or, presumably, the best system in the case of multiple successes) capable of meeting a baseline level of human performance in answering such questions.

According to Charlie Ortiz, senior principal manager of AI and senior research scientist in Nuance’s Natural Language and Artificial Intelligence Laboratory, the primary problem with those chat-based approaches to the Turing test is that it has become more about trickery than about actual intelligence.

Bots like the one referenced above will take on personalities (in that case, a teen from a different country — which only fooled one-third of the human judges) that might elicit leniency when their human counterparts encounter bad grammar or sentence structure. They’ll misspell a tough word to seem more human. When asked a question about themselves or a tough follow-up, they’ll change the subject or otherwise evade the question. While that might trick some people, Ortiz explained, it helps other catch the computers’ in their lies.

“The Turing test in a sense is open-ended and allows too much and it’s tough to test whether you’ve made progress,” he said.

An excerpt from Time blogger Doug Aamoth's conversation with "Ukrainian teenager" Eugene Goostman.

An excerpt from Time blogger Doug Aamoth’s conversation with “Ukrainian teenager” Eugene Goostman.

On the other hand, he added, “Ordinary conversation involves a lot of knowledge about the world the way operates in this common sense way.” The Winograd Schema challenge should allow researchers to measure the progress they’re making on this front.

Ortiz hedged when asked how long he predicts it will be until someone passes the test. “Some people think it’s not that hard,” he said. “Others think it’s something that’s not around the corner.”

It might be closer than skeptics might think. Aside from Nuance’s new Winograd Schema challenge, there are already multiple efforts underway to build artificial intelligence systems that are capable of human-level reasoning and even passing short-answer exams. Among them are the Todai Robot project out of Japan, which is working on a robot that can pass college entrance exams, and researchers at the Allen Institute for Artificial Intelligence in Seattle building a system that can pass a fourth-grade test.

In the video below, from Gigaom’s recent Future of AI event, Allen Institute Executive Director Oren Etzioni describes the challenge of endowing computers with general knowledge even comparable to that of a fourth-grader.


Nuance’s Ortiz calls this the movement toward “big knowledge” as opposed to big data, noting that AI knowledge could come from anywhere — manual encoding, data analysis or even crowdsourcing.

It shouldn’t be surprising that Nuance is bullish enough on the idea of intelligent machines to sponsor a contest that will spur advances in the field. The company makes it money its money in part by selling virtual-assistant technology, and initially powered the backend for Apple’s Siri app. In the future, Ortiz said, the company hopes its technology will have general and even specific knowledge of certain domains, and be able to take into contextual information, such as whether a user is in his car, before answering a question.

“You’re not going to get personal assistant that replaces your doctor,” he said, “but you’ll get a personal assistant that helps you make decisions.”