Google Translation Center: The World’s Largest Translation Memory

Disclosure: I am the founder of Der Mundo, a multilingual blogging service and translation community that combines human and machine translation (provided in part by Google), and I have researched translation technology for more than 10 years via the Worldwide Lexicon project.

Blogoscoped reports that Google is preparing to launch Google Translation Center, a new translation tool for freelance and professional translators. This is an interesting move, and it has broad implications for the translation industry, which up until now has been fragmented and somewhat behind the times, from a technology standpoint

Google has been investing significant resources in a multi-year effort to develop its statistical machine translation technology. Statistical MT works by comparing large numbers of parallel texts that have been translated between languages and from these learns which words and phrases usually map to others — similar to the way humans acquire language. The problem with statistical MT is that it requires a large number of directly translated sentences. These are hard to find, and because of this SMT systems use sources like the proceedings from the European Parliament, United Nations, etc. Which are fine if you’re writing in bureaucrat-speak, but aren’t so great for other texts. Google Translation Center is a straightforward and very clever way to gather a large corpus of parallel texts to train its machine translation systems.

Part machine translator and part translation memory (a sort of search engine for translation that helps translators to recall translations), GTC will help translators by providing a free, global translation memory, and in turn drive costs down by reducing the amount of work needed to complete a text. It will help Google by providing an excellent source of high quality parallel texts that can be fed back into the statistical translation systems.

If Google releases an API for the translation management system, it could establish a de facto standard for integrated machine translation and translation memory, creating a language platform around which projects like Der Mundo can build specialized applications and collect more training data.

On the other hand, GTC could be bad news for translation service bureaus — especially those that use proprietary translation management systems as a way to hold customers and translators hostage. Most translation bureaus aren’t really technology companies and aren’t very competent at building quality software. Google Translation Center fills a void in the translation tools market that was created when the few independent companies, such as Trados, were acquired.

For freelancers, GTC could be very good news; they could work directly with clients and have access to high quality productivity tools. Overall this is a welcome move that will force service providers to focus on quality, while Google, which is competent at software, can focus on building tools. Google has a pretty mixed track record with consumer-facing services outside its core search business. But if it positions itself as a neutral service provider, it could enable projects like Der Mundo and others to create powerful and easy-to-use translation services for a broad range of industries.

Translation management is more complex than it appears, with different practices in different industries. If you’re translating a news story, you want minimal cost and fast turnaround time (publish early, correct often). If you’re translating a product spec sheet, you’re willing to spend more to have it done right before it goes to press. Google would be smart to position GTC as a utility for translators and to encourage service bureaus to standardize around it, much as it did around earlier tools like Trados, and much as it has done with their keyword ad business. That strategy would also eliminate a potential conflict of interest, as translation professionals are understandably wary of contributing to something that could put them out of work, as well as avoid channel conflicts with partners who will be their best advocates in selling to various clients.

While it’s my guess that Google has no intention of directly monetizing the service (charging a commission on transactions it brokers would expose Google to a billing and payment disbursal nightmare), the R&D value of collecting millions of parallel sentences in every language pair imaginable is indisputable, and it will pay off in unforeseen ways. So, my guess is Google will make this a free tool for the translation industry to use, and it will figure the money part out later. It can afford to be patient.

Translation is a very difficult problem. If it weren’t, it would have been solved a long time ago. I remain convinced that a multilingual web will be a reality in a short time, and that a menagerie of tools and services will emerge over the next few years — some geared toward helping translators, some toward building translation communities, and others that make publishing multilingual sites and blogs easy and intuitive.

As these emerge, the web will begin translating itself, and within a short time, we’ll be able to read content from sources worldwide just as we currently explore the web in our own language today.