Bad automated translations have become infamous on the Internet. Business are advised against using “raw” machine translation. Finally, you might have read about an embarrassing mistranslation that made it onto the official site of a Spanish food festival.
To grossly simplify the mechanism, machine translation is normally a combination of substitution rules and large corpora (collections) of texts in the two languages that help the software “decide” what each original string corresponds to in translation. This technology can yield good results when we use domain-specific corpora, controlled authoring, and human post-editing for high-visibility or high-stakes content.
I understand why these best practices need to be followed, you might say, but I am no Fortune 500 company — I cannot afford to hire a human translator or post-editor. I only need to translate this street sign or ask my Italian in-laws what they want for dinner. While augmenting machine translation with the practices mentioned above is still highly recommended for any business/official communication, I would like to share some techniques to check your automated translation when you cannot use other options.
Use Longer Phrases
One of the challenges for machine translation is ambiguity in language — one word can mean different things in different contexts. The way to help the software overcome this is to give it more context. Consider the following example from this BBC article:
English | The little car you can drive in France without a licence | Losing one’s driving licence in the UK is a serious matter – expensive and, to say the least, very inconvenient. |
Google Translation (Russian) | Маленькая машина можно ехать во Франции без лицензии | Потеря свое водительское удостоверение в Великобритании серьезный вопрос – дорого и, по меньшей мере, очень неудобно. |
Bing Translation (Russian) | Маленький автомобиль вы можете управлять во Франции без лицензии | Теряя водительские права в Великобритании это серьезный вопрос – дорого и, мягко говоря, очень неудобно. |
Putting aside grammatical incongruities for a moment, we see that in the first instance, both Google and Bing translated “licence” as litsenziya (лицензия), which is a business or medical license in Russian, but not the document that lets you drive a car. In the second sentence, however, the combination “driving licence” has swayed the result in the correct direction of voditelskiye prava or voditelskoye udostovereniye (both meaning “driver’s license”).
Triangulate
In the example above, both Google and Bing gave similar results. However, that is not always the case. Take this sentence from a post on the Snob website.
Russian | С полгода назад ко мне на прием пришла женщина и попросила совета. |
Google Translation (English) | About half a year ago I was at the reception woman came and asked for advice. |
Bing Translation (English) | With half a year ago to me came a woman and asked the Council. |
The Russian actually says “About half a year ago a woman came to see me [at my office] and asked for advice.” We see that Google produced accurate translations for the timeframe and asking for advice, but it did not convey that this interaction happened at a counseling appointment. Bing, on the other hand, did convey that the woman had come to see the author, but picked the incorrect variant for sovet (advice vs council).
In other words, if you are machine translating something for comprehension and not for further publication, try running the same sentence or text through more than one automated translation engine to cross-check the output and detect any common threads or discrepancies.
Round-Trip It
Finally, if you are translating from a language you know and absolutely cannot use a human translator to do or check the work, don’t just stop at the first automated translation you get. Take that output and machine translate it back. See if the output makes any sense.
For instance, try round-tripping the sentence “A land where dinosaurs once roamed, this prehistoric evolutionary cauldron is a playground for naturalists” from a CNN travel article. You may want to use a different automated translation engine than the one you used to do the first translation. I have seen some comical results with this sentence.
I would especially like to hear from people outside the language industry — do you use machine translation in your work? What made you chose this method over others? How do you make sure the translation is meeting your expectations?