I took Duolingo’s standardized language test of the future on my phone

Standardized tests are part of the educational process, but their primary purpose is not educational: Most tests are designed to give a score at the end, a credential which presumably indicates the test taker has a certain level of knowledge. While massive open online courses (MOOCs) and other internet-based tools have focused on providing quality instruction cheaply through the internet, there haven’t been many efforts focused on using the same tools to provide useful certifications.

In July, Duolingo, best known for its translation and language learning app, released Test Center. It’s an Android and web app that lets users take a proctored exam and provides a statistically validated score, which hopefully will eventually be accepted by schools and institutions. So far, it’s been a success: the Android app has been downloaded 157,000 times and 9,000 tests have been completed and scored.

The tests are currently free, but eventually, they’ll cost $20. “We came up with the smallest value that works for us and that a lot of people can pay,” Duolingo head of marketing Gina Gotthilf said.

Test Center is based on a simple idea: by using the sensors on modern mobile devices to record both audio and video along with paid human proctors to review the recordings, it can provide a test that produces a reliable score. Duolingo has set its eyes on the Educational Testing Service’s TOEFL exam, currently the most widely used standard for English proficiency.

English certification is also the kind of market that could use a little bit of new blood.TOEFL tests cost nearly $200 depending on the currency, and for lots of people, it’s hard to travel to a test center which could be hundreds of miles away. Duolingo is fond of pointing out that for some, a $20 test and a cheap $100 Android smartphone might be less expensive than traveling to take theTOEFL.

Of course, the concept of a remotely proctored exam with a score you can use doesn’t just need to be limited to foreign languages — conceivably, if app-based testing takes off, the American childhood ritual of the Saturday morning SAT exam could be replaced by a shorter, smarter test on a mobile device.

The fact that I could take a No. 2 pencil-free exam without leaving my living room, in under the time it takes to watch an episode of Silicon Valley, almost made it sound like an appealing experience. So I decided to answer the question burning in my mind since Test Center was announced: What happens when an American born native English speaker takes a test to prove that he indeed speaks his mother tongue fluently?

Test day

Before you start, you need to provide two forms of documentation: a picture of your photo ID, and a picture of your ugly mug. Then the app provides several warnings: You should be alone, you should not have headphones on, and you need to keep your face in your front-facing camera’s view the entire time. The app provides a Skype-like box in the corner which shows what the front-facing camera is recording.

I’m a flighty person in general who doesn’t like to keep my eyes focused on one thing, but since the test was only twenty minutes I didn’t feel the fatigue that would’ve led me to stare into space. The questions, each of which had a strict time limit, also didn’t allow my mind to wander.

The first time I took the test, it kicked me out before I finished answering my first question. I tried to take a screenshot, which led to me going back to the Android home screen, and when I reentered the app, it said I had to wait 24 hours before taking it again.

There are four kinds of questions on the test: vocabulary questions, listening and transcription, sentence completion, and a speaking test.

EN Phone 4 (2)

A vocabulary question

I’m pretty sure I aced the speaking section: parroting the phrases that my phone told me to read felt silly, but I had no worries that I was pronouncing either the terms correctly and without a heavy accent aside from my Mid-Atlantic patois. I also felt good about the listening and transcription section: Sometimes the sentences that were provided were long and contained words that required a spelling double check, but most of the time I had no issues. I know, I was surprised too; My freshman year English teacher would be proud of me.

Other sections were harder. Like the vocabulary section, which asked me to pick out the real English words in a group. Sometimes there was a word that looked like it was an English word, but I didn’t know the definition and wasn’t sure if it was a word I simply didn’t know. The sentence completion was also tricky. Those questions took passages out of the public domain, removed certain terms and offered options for words that should belong in the blank space like a game of multiple-choice mad libs. Sometimes I didn’t know what the passage was saying at all, with several blanks coming one right after another.

But the exam doesn’t test for conversational skills: I never had to converse with someone or write an English sentence of my own.

The test took 18 minutes, which is a lot shorter than nearly every standardized test I’ve ever encountered. The reason the test might be short is because it uses a method called computerized adaptive testing, which uses answers to previous questions to tailor future questions. If you’ve aced previous answers, you’ll get harder questions in the future. This means two people taking the same adaptive tests could receive wildly different exams.

Adaptive testing isn’t new, nor is it controversial. In fact, the TOEFL is computer adaptive. But most adaptive tests include dummy questions that don’t count towards the score, which lengthens the exam. The reason that these tests have questions that don’t count is because it’s impossible to calculate a final score on an adaptive test when there’s no previous data on which questions are easy or difficult. One reason Duolingo can keep its Test Center tests short is because it can collect that kind of data from its main app, which offers many of the same questions in the context of a game.

Will institutions embrace the credential?

But there have been tests given using adaptive methods and smartphones before, such as Testive’s SAT diagnostic exam or Blackboard’s mobile app. The difference between Duolingo and those apps is that Duolingo wants to create a new credential: it wants a passing Test Center grade to be good enough for students applying for college and workers looking for jobs. In order to do that, it will need to convince those institutions of two things: one, the test is not prone to cheating, and two, the that the test actually does measure whether someone can speak English.

In order to combat cheating — which can be pervasive in some regions — Duolingo is having a proctor review all Test Center exams. It is outsourcing that service to a proctoring company (it wouldn’t tell me which one) which reviews the test after it is completed. The proctors look for what they call “infringements” and classify them into two categories.

A major infringement, like, say, having someone off-screen tell you an answer, is an automatic disqualification. The punishment for minor infringements, such as going off camera for a minute, or taking the test “not in a private quiet place,” isn’t as severe, but when five of them add up the test is invalidated.

“People get an email letting them know they are disqualified and they can see this status on the actual app,” Gotthilf said. “They can then retake the test, if they want, after 24 hours.”


The second issue, whether Test Center actually measures English fluency, is trickier. The approach Duolingo is taking is it’s trying to prove its test is well correlated with the exact exam it is trying to dislodge: the TOEFL. That’s going to be easier said than done, because it’s going to require a good number of students to take both exams in order to have large enough data sets to convince university administrators.

There is one validity and reliability study done by a professor at the University of Pittsburgh. While it’s not peer-reviewed and it was sponsored by Duolingo, its methodology seems acceptable. The study looked at, primarily, Chinese speakers, Spanish speakers, and Portuguese speakers. (Brazil is currently Duolingo’s biggest market.) The subjects were overwhelmingly college students — nearly 72 percent — and about 40 percent had studied in the United States. As a group, the average amount of time spent studying English was about 11.25 years. My one major issue with the study is that it only sampled the type of students expected to pass the TOEFL.

Correlation between Duolingo test scores and TOEFL test scores,

Correlation between Duolingo test scores and TOEFL test scores

What the study found is that the TOEFL scores and Test Center scores are correlated, but it’s not perfect. That makes sense — if you were to take the same test twice, it’s unlikely you’d get the same score both times. Another important thing to check is whether the students get the same score on the test if they took it multiple times. According to the study, same-person scores are largely correlated, although at a slightly lower rate than most other mainstream non-adaptive tests.

Move fast and break things

But the issue with Duolingo creating its own credential is that it’s a for-profit, venture-backed startup, not a nonprofit like ETS, which administers the TOEFL. Part of the startup mentality is to provide rapid updates, which could mean that one version of the Test Center test might be slightly different than another in a way that calls the validity of the score into question.

For instance, Duolingo decided to break out the Test Center into a separate app, like lots of app companies are doing. But when Facebook decides to separate Messenger from its main app, it’s not like there are schools and corporations — million dollar institutions — counting on the service to work the exact same way is has in the past. Of course, it’s not like the TOEFL is unchanging either; it gets updated once a year.

In addition, there are issues with the devices themselves. For instance, my Android keyboard autocompleted words when I was transcribing during the exam. A well-trained keyboard could end up doing the majority of a lot of spelling for some test-takers. But, Gotthilf said, this isn’t too different from taking a proctored exam: “Some people have to take tests in noisy environments, on crappy keyboards, in severe heat. However, what matters at the end of the day is that there is a high correlation between the tests in terms of the final scores.”

In order to combat these discrepancies, Duolingo will date every exam. Also, unlike the TOEFL, there won’t be an expiration date, so each institution needs to figure out what tests it will accept, if any. Duolingo plans to offer an option to send scores directly, which will obviate the need for an institution to verify a screenshot or email. Duolingo also plans to offer a Spanish language certification by the end of September.

As for my score? A day after I took the test, I got an email informing me I received a 10/10, which means I’m an “expert” in the English language. That’s a relief, because unlike the people the test is actually intended for, I don’t have another language to fall back on.