Leaving it to the engineers

It is possible that this belongs on my other site but…Fine.

I think this was caused by a tweet linked to a comment made at the SCIC21 interpreter training conference during the week. It is an annual conference involving the interpreting service of the European Commission and the universities which run courses that supply the interpreter pipeline. The comment, which I now cannot find, grosso modo, amounted to highlighting the risk in allowing the engineers to be in charge of the future of the language industry. In other words, Google.

EDA: and the tweet is here from Marcin Feder:

C. Tiayon – do not leave languages in the hands of engineers, a reference to Google, etc.

(quick thanks to Alexander Drechsel who found it for me)

I have a background in artificial intelligence and machine learning and as part of getting it, I did a machine learning course where the lecturer confidently asserted that translation as a problem had now been solved by applying statistical methods. This basically means, loads of data and learning from that. There are well known issues with this; it tends to be good, as in better than what went before, but anyone who has seen both Google Translate and Bing Translate in action would really be admitting that it’s still not great. In many cases it’s terrible. Twitter’s language recognition engine is a bit hit or miss too which makes its translations hilariously absent. One of the things which annoyed me in general when I did my CompSci masters was the extraordinary tolerance computer scientists have for faults in things which are not necessarily computer science. Woe betide you if you don’t comply with someone’s pet programming style peeve (usually in the area of variable names) but a 60% success rate in whatever the code is trying to do, like, translate something from French to English is tolerable. The computer end of business they are shockingly fussy about, the business end of business, less so. Human interpreters and human translators are still far, far better than computers in terms of transferring meaning from one language to another. This is because meaning is not all verbal and computers are not good at nuance.

So we keep hearing how great computers are at something or other – lately it has been GO – or diagnosing some illness or other. I have doubts about the last one because often that’s a question of judgement rather than straight binary… – Anyway we keep hearing how great computers are at some task but when you drill down, it is because monumental assumptions and allowances have been made. You can read, for example, that Duolingo is better than college courses for learning languages and that this is scientifically proven. The relevant study had a 75% drop out rate. That means 75% of people who started learning a language to in a programme to measure the effectiveness of Duolingo dropped out before the end of the study.

This is not just true for languages but in general, if computer science or technology is getting involved in your industry, it is worth paying careful attention to what they consider to be adequate performance. It may well  be significantly less than is considered adequate in general in your industry and you will want to know the rationale for that.