When it comes to measuring quality, we are surprisingly unsuspicious once a metric comes into the play. As soon as someone hands you numbers, or a chart, there is a good chance that you will trust in those numbers – especially if they support what you already believe. It is always important to know where those numbers come from, and what exactly they measure. Especially in the field of (neural) machine translation, trusting numbers blindly can have severe consequences.
Back in the days when I was a machine translation specialist, it was part of my job to make sure that the machine translation output we used had a certain quality. I was positioned between the Sales and Production departments of the company, because that certain quality was important for both: As the content usually got post-edited, I had to check if the post-editors would actually be able to work with the output. And as machine translation and post-editing (MTPE) was a cheaper product than good old translation, the Sales guys wanted to know how much they could go down with our rates.
No, this is not yet another article with motivating mantras about you being good enough. You are! Trust me. This is a blog post about quality assurance. Before I became Head of Technology, my position was Machine Translation Specialist. As such, I was confronted with this question on a daily, nay, hourly basis regarding raw or post-edited machine translation output. I often struggled to answer it, and I could imagine that I am not the only one. So here’s my thoughts – maybe they help you the next time when someone asks you exactly this question.