Statistically translating phrases with unusual translations
Google Translate, according to Wikipedia and my own empirical observations, is based on the statistical machine translation paradigm. Rather than constructing its translations by learning dictionaries and rules of grammar, a statistical machine translator will analyze texts for which it has known good translations in multiple languages and will learn how to translate new phrases from them. Statistical translation is descriptivist, reflecting how people actually write and speak rather than how rules of syntax dictate they should write and speak. (To the extent that the source texts themselves reflect how people actually write and speak, of course.)
Consequently, phrases that are in practice translated in a manner that differs strongly from their literal translations are translated as done in practice, not in the literal sense. Idioms are certainly one type of phrase that match this criterion:
- L’habit ne fait pas le moine (French) translates to The clothes do not make the man (English), although the literal translation is The robe does not make the monk. (What is quite interesting is if you start with a lower case, l’habit ne fait pas le moine, you get a non-idiomatic translation, appearances can be deceiving.)
- I’m pulling your leg (English) translates to Yo estoy tomando el pelo (Spanish), with pulling your leg translated to a phrase that in Spanish literally means pulling your hair (but has the same meaning as the English idiom).
- I guess either Google did not source the news about the Costa Concordia disaster or faced too much diversity of translation when sourcing it, because the infamous phrase Vada a bordo, cazzo (Italian) is translated as Go on board, fucking (English), which is sort of broken and clearly was not parsed as a single phrase. This phrase was shouted by an Italian Coast Guard officer at the boat’s captain when the captain proved unwilling to go back and help the rescue; from what I’ve read it seems the right translation might be Get on board, dammit (what the press said) or Get the fuck on board (I suspect this is more unbowdlerizedly accurate, it sounds like the kind of thing a seafaring officer would have said in the stress of that situation if he were speaking English) or Get on board, you dick (apparently more literal, but I think the second phrase sounds slightly more natural).
Another kind of phrase falling into this category is titles. 千と千尋の神隠し (Japanese), a beautiful 2001 animated movie directed by Hayao Miyazaki, translates to Spirited Away (English), which was how the studios translated its title when releasing it to English-speaking countries. The same title also translates to Voyage de Chihiro (French), which was its title in French-speaking countries (almost; it was more precisely Le voyage de Chihiro, and I wonder if there’s a non-statistical rule at play on Google’s side that made it drop the article?).
I don’t speak a word of Japanese, but I found this article regarding the translation of the title, which more directly translates it into English as “Sen and Chihiro’s (experience of) being spirited away.” (In the film Chihiro is at one point renamed Sen, which has significance in her need to hold on to her identity.) Hence both of the above “official” translations differ from the literal translation, and in different ways; the English translation drops most of the title but retains the “spiriting away,” and the French translation drops Sen and converts the “spiriting away” into “the voyage.”
This poses a bit of a problem when you actually want a literal translation. On my tumblr I recently referenced the fact that the Chinese title of Infernal Affairs, the Hong Kong movie upon which The Departed is based, apparently more directly translates to “the non-stop path,” a reference to Buddhist/Chinese hell. But when you feed 無間道 into Google Translate, you get The Departed in English (Infernal Affairs is actually listed as an alternate translation and not the first choice! Interesting that that happened). To underscore Google’s proper-noun interpretation of this phrase, French and Spanish also translate this to The Departed, which I guess means that most of Google’s source text in these languages reused the untranslated English title. (Translating to Portuguese, on the other hand, produces Os Infiltrados, which according to IMDB was the title under which the film was released in Brazil.)
In any case you can’t get any other English translation from Google on this count. I do greatly prefer the data-driven, descriptivist approach of statistical translation over a rules-based approach (and the success of Google Translate is a testament to the validity of the statistics); this is a small but interesting area where it falls a little short. You’re only as good as your data.