Measuring Machine Translation Quality: “Is It Good Enough?”

Posted by

This is part I of a mini series about quality assurance in the localization industry. Check out part II (16 Sept) and III (23 Sept)!

No, this is not yet another article with motivating mantras about you being good enough. You are! Trust me. This is a blog post about quality assurance. Before I became Head of Technology, my position was Machine Translation Specialist. As such, I was confronted with this question on a daily, nay, hourly basis regarding raw or post-edited machine translation output. I often struggled to answer it, and I could imagine that I am not the only one. So here’s my thoughts – maybe they help you the next time when someone asks you exactly this question.

Deconstructing The Question

Answering the above mentioned question is not hard at all. I mean, all you have to do is saying yes or no, right? The question just consists of two meaningful words (let’s skip the deep analysis of is it for the moment, ok?): good and enough.

The definition of something that is as generic as the word good is difficult if we isolate the sentence. Let us start with the usual first step of any 2020 research project: Googling it. The Oxford Dictionary defines good as something of high quality or an acceptable standard. There it is, the magic word: quality. So being good automatically refers to having a certain quality. In the realm of (machine) translation quality, it is important to keep the ‘human quality paradox’ in mind (here’s a blog post about it): Humans like you and me are very prone to making mistakes while speaking or writing.

We All Make Mistakes

Speaking usually represents the informal and spontaneous speech, while writing historically was reserved for the formal and thought-through speech. Even in this latter form, no matter how often someone reviewed a book or letter or essay, one still could find an occasional error. Errors increase drastically in spontaneous utterances, most often without anyone noticing them. That is because our brain helps us to focus on the important stuff – so as long as you can get your message across, you will most probably not hear all of the mistakes in spontaneous speech. Since the internet has become an integral part of our daily lives, we also make mistakes in the realm of informal writing. This area is a nice treat for all the linguists out there: While speech is perishable, the mistakes you make on the internet are for eternity (woohoo!). So with the growing influence of the internet and especially social media (the crockpot of opinions and status updates which are closest to spontaneous speech), we got gradually more used to not only hearing mistakes, but also seeing and reading them. This was new – and is still offputting for people that are not used to the internet. If you gave my 90-year-old grandfather a tablet and asked him to assess some blog posts, social media posts or tweets, there is a great chance that he would point out the bad writing, style or typos as one of his criteria – because he is used to the old, formal form of writing in which publishing something with a typo was the ultimate failure (also related to the high costs of publishing – which are not comparable to clicking post on your latest Facebook status update). If you show the same set of texts to a millennial, they might not even refer to the errors. That does not mean that they do not notice them. They have just learned that errors in posts or tweets are something that we have to live with and that do not necessarily influence the message of said tweet or post.

The Thin Line Between Good and Not Good Anymore

Unless – errors reach a certain threshold. If they become too prominent, they will be noticed for sure. And that’s how it is with the human quality paradox: The moment your brain actively lets you notice that there are mistakes in the text, you will look closer. And find more. And look even closer. And find even more… And that’s exactly when something starts not not be good anymore. As you can imagine, defining this threshold as a universal rule will never be possible. It depends on your error tolerance, the purpose of your reading and the context of the situation in which you’re reading. If you are researching for your PhD thesis, and you find a paper published online that has several typos in the abstract already, you will become suspicious of the quality of their research. If you are trying to find out how you can safely can food, you will most probably not be repelled by some typos in a blog post and still digest the content. The same accounts for my grandfather and the millennial: Their error tolerance is quite different because their expectations of written information differs so much. The millennial will tolerate many more typos without losing trust in the source of the information (which does not mean that they are naive: my grandfather will maybe fall for the Nigerian prince that needs a bit of money in order to ship the tons of gold, whereas the millennial knows that the Nigerian prince is as real as all the housewives looking for fun in their neighborhood).

Back to our original problem: As you can see, defining good is not possible as a universal, always applicable rule. If we evaluate if something is good or not, we need to integrate the context of the situation at all times. There will never be no context at all. If we do not explicitly refer to certain contextual facts, we will automatically apply one of the two following: our own context, or the context of the person who asked the question. That’s the beauty of communication: There are many things that remain unsaid, and our brain automatically fills the gaps with what it thinks fits well. If someone asks us: Is this good?, our brain will, due to the lacking, of any other context, automatically refer to what we think is good.

If the object or text in focus has some qualities that are universally agreed to be good, we may be able to answer the question to the liking of the person who asked. However, there are many borderline cases where we may think that something is good, while others do not – or the other way round.

Enough Said…?!

Our original question was not only whether something is good, but if something is good enough. This makes it even more interesting: Being good can basically mean anything, as long as it’s above this threshold we tried to define. Good enough means that we want to match this threshold exactly, and balance on the thin line between good and not good anymore. Enough also implies that it has to be sufficient. The follow-up question of course is: For whom?

Again, you can see that what we need is context. Does the person want me to define my own threshold of being good enough? Do they want me to refer to theirs? If so, what is theirs? Or is it about someone else’s threshold? Lacking this crucial information may lead to misunderstandings and inconsistency. If you ask me, and three of my colleagues, the probability that we give you contradicting answers is high. However, none of us will have lied as we all answered based on our best knowledge.

It’s Always Good Enough For Someone

That said, if you get that question frequently, you should make sure that you also have the following information:

  • In what context should you answer this question?
  • What is the purpose of the text or object you evaluate?
  • Who is asking and what are their expectations?

If you do not receive these facts, ask for them. Make clear that answering without these crucial information makes your task impossible – without them, you will have to assume them, which makes your answer only binding for the context of your assumptions. Make sure to verbalize said assumptions – this will prevent that you continue to speak about different things in the future and hopefully prompt the person who requested your evaluation to think about the context themselves.

Read Further

If you’re interested in the linguistics of the internet, I recommend the great book Because Internet: Understanding how language is changing by Gretchen McCulloch. As usual, I don’t get any money from you clicking on that link (and I would encourage you to support your local bookstore with a purchase instead!), I simply enjoyed reading her book a lot.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s