There’s a lot of enviable technology in the world of “Star Trek.” Starships zip around faster than the speed of light, diseases are cured with just a few spritzes of hypospray and food replicators can materialize tasty meals out of energy in just a few seconds.
But one of the most practical tools the Starfleet organization has on its journeys to alien civilizations is the universal translator — a device that automatically translates speech into each person’s primary language. Essentially, an English speaker will hear everyone talking to them in English, no matter what language they are actually speaking.
What Is Machine Translation?
Machine translation is an application of natural language processing that trains computer models to translate texts between languages. Older methods relied on phrase-based techniques, but deep learning techniques now dominate the field of machine translation and produce much more accurate results.
But is it possible to build a universal translator in the real world? Natural language processing has made astonishing progress over the last few years, with projects like GPT-3 able to generate sentences and paragraphs that can fool even experts in the field. In a world where that’s possible, why does Netflix still face a translator shortage for translating the subtitles on their TV shows?
Maite Taboada, professor of linguistics at Simon Fraser University in British Columbia, Canada, said the sticking point has to do with context. It’s in the subtleties of meaning where machines are falling behind in their ability to do translations. And no matter how much data we throw at the problem, we may never get to the point where it’s accurate enough.
Machine Translation Is Natural Language Processing’s Ultimate Problem
“Machine translation is the ultimate application,” Taboada said. “It’s like the Holy Grail.”
Taboada is referring to machine translation’s special status in the field of natural language processing, an area of computer science research that aims to have computers “understand” languages similar to how humans do. Machine translation builds on all the other knowledge under natural language processing’s domain, like grammar, language understanding and language generation. All these underlying topics need to be mastered in order to build a good machine translation tool.
“So when we say, ‘What are the hurdles of machine translation?’ — well, it’s all the hurdles of natural language processing,” Taboada said.
The good news is that researchers have been studying natural language processing for over 50 years now. Many areas within this field are already well understood. For example, automation of tasks like spell-checking works almost perfectly: Spell-check programs only very rarely require human intervention and it’s now, practically speaking, something humans can depend on machines for, Taboada said.
Today, Machine Translation Means Deep Learning
An early technique researchers used for machine translation was phrase-based machine translation, which uses supervised learning to translate known phrases. Supervised learning relies on humans to label training data before it gets fed into the training model, therefore creating bottlenecks around the data algorithms have access to. The technique struggled with “long-distance dependencies,” where the accuracy of translations faltered on longer pieces of content.
For instance, if a sentence mentions a car at the top of a paragraph, and the last sentence in the paragraph refers back to the car as “it,” the algorithm may get confused about what that “it” relates to.
Phrase-based machine translation techniques are not used much anymore, and that’s mostly because, in 2016, Google switched the algorithm powering their Google Translate tool from phrase-based translation to deep learning, a machine learning technique that relies on building large neural networks. Google said the new technique reduced errors by around 60 percent.
“And then the thing exploded, everybody wanted to use deep learning,” Taboada said. “I think there’s no going back.”
One of the main advantages of the deep learning technique is that it can be trained using mostly unsupervised learning — it doesn’t require as much human supervision for labeling data for the training process to work.
Taboada said the lack of supervision is possible because deep learning can infer the meaning of words and expressions from context. The meaning of words are mapped as “vectors” in multi-dimensional space, and when two are often observed together, the algorithm learns that their meanings are related. As a result, deep learning is able to use that vector-style understanding of the meaning of words to help with the translation process.
Bias and Lack of Data for Many Languages Hinder Machine Translation
Even with the deep learning model, many obstacles remain for building a universal translator. One is the issue of bias in training data. Because deep learning uses unsupervised methods, it learns everything by pulling in data from the world and, as a result, inherits the same problems and biases that exist in the world.
Taboada illustrated the problem with the example of nouns that have genders. In some languages, like Spanish, a translation into the language needs to include genders even when the original text doesn’t specify genders. For example, if the word “doctor” is translated from English into Spanish, it has to have a gender, and what that gender is may be decided by the predominant gender associated with doctors in the model’s training data.
“Data just reflects the way the world is, but that’s not necessarily the way we want the world to be, and it may not be appropriate.”
“So you go out in the world, what do you see? Maybe ‘doctor’ is 70 percent of the time translated as ‘el doctor’ in Spanish and ‘nurse’ is 85 percent of the time translated as ‘la enfermera,’ feminine,” Taboada said. “That’s actually not a fault of the data — data just reflects the way the world is, but that’s not necessarily the way we want the world to be, and it may not be appropriate.”
There are other concerns as well. Some languages simply may not have enough data to build good training models. And algorithms may not be able to differentiate between nuances like dialects, effectively flattening translations.
For streaming services like Netflix, part of the difficulty of translating subtitles for shows and movies is the physical constraints of time and space on the screen — sometimes translations are too long to fit on screen or be read quickly enough. In those cases, humans are needed to make the hard decisions about what to cut so the subtitles are still enjoyable.
The Biggest Obstacle for Machine Translation Is Context
Another big hurdle for machine translations is the problem of context. While there is plenty of data for deep learning to train on, such as content on Wikipedia, books or academic articles, it can be hard for algorithms to learn from the language differences between those mediums.
“It’s just completely different the way I write an academic article from the way I write a tweet, and that all gets collapsed into one set of data,” Taboada said.
Untangling that still requires work on behalf of humans. Taboada specializes in sentiment analysis, a field within natural language processing that analyzes the emotions behind sentences and phrases.
While machine translation has come a long way, it still struggles with detecting subtle positive and negative emotions. Deep learning algorithms are quite capable of translating texts like user manuals, which don’t usually contain emotional phrases or require a lot of cultural context to understand, Taboada said. And they are also capable of doing a decent job at content moderation, enabling companies to scale up their content moderation and automatically find inappropriate comments.
But algorithms are still not very reliable at these tasks when they require a great deal of nuance. A 2016 study examined the words and phrases used by both racist and anti-racist online communities and found many linguistic similarities. Those similarities make it difficult to accurately detect hate speech because it’s easy to accidentally block anti-racist comments.
“I would never have a call between [President Joe] Biden and [Russian President Vladimir] Putin be translated automatically.”
“Moderating hate speech is really difficult because the words overlap,” Taboada said. “So you need to know more about the context and the way in which they’re used to to understand whether that’s something that needs to be deleted or not.”
When it comes to sensitive issues and situations where nuance is important, like legal contracts or political matters, machine translation is not an appropriate application.
“I would never have a call between [President Joe] Biden and [Russian President Vladimir] Putin be translated automatically,” she said.
Would machine translation ever be accurate and dependable enough for translating sensitive conversations, if deep learning models had unlimited data for their translation models to train on?
“I don’t think so,” Taboada said. “With a complex problem like machine translation, right now, I don’t see how it’ll ever be good enough that I can click a button, walk away and assume that the translation is going to be great and I don’t need to do anything about it.