This is part 2/2 of a series about machine translation. Read part 1 here.
There was another big difference between the rules-based engines and the statistical ones: the invention of the internet. While rules-based engines (which also were quite maintenance-heavy) were usually not accessed by the broader public, Google Translate became one of the first publicly available translation engines. The internet did what the internet does best: It tested its boundaries and made fun of it. Google singing songs became a hit on Youtube. Of course it did, when after translating a songtext back and forth for a few times, the Little Mermaid became the bright young woman from the disease of the sponge, or Elsa from Frozen suddenly sings about giving up instead of letting it go. Who could possibly not love this?! Go and watch those two videos; you will notice that all sentences are broken down to only a few words that seem connected to each other, followed by the random next few connected words.
That is where the phrase that something sounds like Google Translate originated. This could refer to using words in awkward places or senses, or an uncommon grammatical structure. It usually just is an impression someone has, although they may not be able to pinpoint why exactly they are feeling this way. For years and years, Google Translate delivered this kind of results. Nonetheless, as it was available online, many business owners used it to make their products available for tourists, often leading to hilarious results in menus, on warning signs or in touristic information (many of these are the result of such efforts).
This went on until 2016. And then neural machine translation was introduced.
Neural Networks For The Win
Google upgraded their public translation engine to this new technique back in 2016, claiming on their developer blog that they were able to reach human parity with this machine translation. And for real, neural networks have changed the face of machine translation completely. Instead of ill-sounding, but semantically more accurate translations that you could get with statistical engines, you could suddenly get perfect-sounding translations – as long as you didn’t look into the details too much. Neural machine translation regularly fails in representing the semantics of a source correctly.
How Neural Machine Translation Works
Neural machine translation, as the name already tells us, uses a neural network to find out about language rules and connections between words. Instead of creating a statistical analysis like the previous translation engines, a large (very large) dataset is run through a neural network consisting of several layers which finds the optimal translation. As going into the details would take a lot longer than one blog post, let’s just leave it at this and concentrate on the changes that were visible in the output:
- Neural machine translation can ‘look back’ and consider what it has already translated – this means that sentences will no longer abruptly end after a few words. Sentence structures are represented much better and lead to a fluid and nice-sounding output.
- If unknown words are encountered, they are not just transferred to the target sentence. Instead, the engine will guess the target word (and thus sometimes invent new words, which appears as if it can be creative on its own).
- The fluency of the output may be misleading – not everything that sounds fluent is also correct. Identifying errors with neural machine translation output, however, is much harder than with the statistical one. Your brain will automatically tell you that something is off if the grammatical structure is damaged, but it will be deceived if the grammar looks flawless. Don’t be fooled!
If you want to check this for yourself, go and watch another video of Google Translate Sings: The famous hit Into the Unknown from Disney’s Frozen II. While the results still are quite funny, you will notice that as a contrast to the older videos, the text she sings sounds very fluent and even makes sense (from a grammatical point of view). Now, remember that she puts every text through several layers of Google Translate (and several different languages) – if you just do this once, the results are much better and very, very close to the original.
If you want to test neural machine translation further, do it. Try to translate long and complex sentences, and you will see that the translation keeps up with the original quite well. By the way, the German competitor DeepL has a really good neural engine as well. You can even substitute words in the online editor and it will provide you with synonyms and alternative solutions.
You Still Sound Like Google Translate…?
If someone tells you that your texts sound like Google Translate nowadays, it may have two reasons:
- You used Google Translate.
- You didn’t use Google Translate, but translated quite literally (even professional translators are criticized with this comparison!)
If you used Google Translate, and someone figured it out – kudos to that person. As the surface structure is so great with neural machine translation, they have to be knowledgeable on what to analyze in order to find out whether something was machine-translated or not. It is far more likely that you translated something yourself in a very literal way. Translating very close to the source text may lead to awkward structures in the target language – and sounding like Google Translate, as explained above, referred to awkward sentence structures and a somehow ‘robotic’ tone for a long time.
While neural machine translation is publicly available since 2016, and Google has increased their language scope ever since, the reputation of machine translation is far worse than its current quality. It will take a while to enhance its reputation. One reason for this is that the quality is so strikingly good that many will not even know when something is machine-translated (check my article on the human quality paradox about this phenomenon!) – so they will not update their picture of automatic translations, until they maybe come into direct contact with it themselves.