Lost in Translation
Will Generative AI replace humans in translation of books? On Escapism, hallucinations and omissions in Generative AI, disturbing revelations about translated fiction, and an unexpected factor.
October 2023 was depressing. To the extent that even burying myself in my do-good Health Tech work - my usual way of escapism - did not help cope with the deep sorrow caused by so many deaths in the region.
And so, in a desperate need for some escapism from the heartbreaking reality, I was looking for a light read, preferably a nice love story, where nobody gets killed, with no hate speech and no violence. The choice went for Red, White & Royal Blue, a popular LGBTQ novel by Casey McQuiston.
Yes, I’m very much aware that this comes like two+ years after the hype around this book – well, at that time I was busy at the frontlines of the digital first-responders in a global healthcare crisis. Still catching up with the world, people.
A key point here is that English is not my native language, and when reading fiction for fun, I sometimes get lazy and go for the translated version. This was the case here.
Translated book: meh.
Few weeks later, I watched the film.
Film: lovely.
Something did not compile. That’s impossible, I thought, a meh book does not become a lovely film.
Few weeks later, as an experiment, I read the original version of the book in English on a long flight on my way to a business trip in US.
Original book in English: different, clever, nuanced, touching.
That completely threw me off. Like, what?
Few things came to mind:
That translation was fine, but it lost many of the subtle nuances that made the book so touching. Do translated books often get lost in translation? What other books have I missed in all these years of reading translated fiction?
And what does this mean in the context of translation technology? How can machine translation ever work, when even human translators are not quite on the mark?
Or perhaps it was something else? Was it because the film gave the characters a face and a voice, making them much more alive when I read the English version?
The film, recently nominated to the PGA Awards, is sweet and funny. The cast is great. Was it the film that made the book better? In other words – was the cast crawling into the book?
This required a follow-up experiment with more books. We’ll get back to that in a moment.
Was it the film that made the book better?
In other words – was the cast crawling into the book?
Literary Translation
Translation of literature is a complex thing. Translation technologies have been around for a while – for example, Google Translate was released in 2006, but translation of novels is still being done by humans.
It is not just about literally translating the text - it's about translating culture, emotions, humor, and style. Literary translation has many challenges. The translator needs to maintain the original tone and keep the author's unique writing style and expression. The choice of words should trigger the same understanding and emotion from the readers as the original text. Translating culture-specific expressions or references can be challenging by itself: translators need to decide whether they keep the original reference or find an equivalent reference in the target language's culture. Think about slang, sports idioms, or even profanities.
As we often see, even in human-translated content, some subtle nuances are lost in translation. This raises the question: will Generative AI do this right?
Even in human-translated content, subtle nuances are sometimes lost in translation.
Will Generative AI do this right?
Generative AI and Translation
Translation technologies such as Google Translate are far from perfect. But is ChatGPT any better?
Generative AI is sometimes not doing great with understanding sophisticated nuances. And as for being able to translate slang and culture-specific idioms, it is not doing great either, at least right now, depending on the target language. It is quite clear that for AI to be able to handle the literature translation task properly, it needs to be trained on this kind of data.
Training an AI model means you show it examples. The model learns from these examples over time. The more examples it sees, the better it gets. To train a large language model (LLM), you need a very large data set, like the entire Wikipedia.
Training an AI model for translation works by feeding the model with examples of pairs of texts in two languages. The model learns the translation patterns and rules from these examples. After training, it can translate new sentences it has never seen during the training process.
What this all means is that for training a translation model, you need many side-by-side examples of original vs. translated pairs. Millions for a start.
There are more challenges. Generative AI can sometimes omit details from the original text. When omissions involve important nuances, the translation could make the reader miss the point. Generative AI sometimes also needs to be kept honest for not making things up – adding facts and made-up details that do not appear in the original text, a phenomenon we call hallucinations. Can you imagine what would happen if hallucinations or omissions occur in a clinical setup, where the original data is a clinical note or a radiology report?
Practically, to cope with all of that, a human would still need to review the results.
Can you imagine what would happen if hallucinations or omissions occur in a clinical setup, where the original data is a clinical note or a radiology report?
The conclusion here is similar to the one in my post about the impact of Generative AI on the film industry: translation of literature will likely continue to involve humans, at least for now, but their work will change, and they will likely be using Generative AI as a key supportive tool, generating draft translations, suggesting culture-adapted slang, idioms, etc. We call those Generative AI-based supportive tools that assist humans in their work Copilots.
Google Translate vs. ChatGPT4
Generative AI seems to be doing better than Google Translate in some cases. ChatGPT keeps more context, which helps overcome some of the biases and create better results. We can see that in the example below that uses a short generated text based on my unusual White House story, to illustrate the importance of context.
Google Translate produced somewhat of a literal translation and even demonstrated some gender bias in various target languages - believe it or not, female scientists do exist, thank you very much.
ChatGPT4 translation was done in a clean session, so it’s not impacted by the original text generation. Not great either, but it did not demonstrate gender bias.
More about biases in AI models in one of my upcoming blog posts.
The Books Experiment Follow-up
The devastating wartime continued for months. And so, in the name of escapism, my books experiment continued, with few more books: some pairs of translated vs. original, different genres, different authors, different translators. Light-reads and award-winning novels, medicine-related novels and other topics, classics and new ones, books that became a film and books that never did.
One of them was The Idea of You by Robinne Lee, that is soon becoming a film. The cast was announced few months ago and is partially overlapping with the cast of the film mentioned above. Started with the translated version.
That read was agonizing. And it wasn’t even about the translation. It was the story itself that felt like watching a train wreck in slow motion: what tf are you doing sister. The double standards, the toxic social media – so heartbreaking that I barely survived to the end of the book.
But something else also happened: even without watching the film (no film yet) - not even a trailer (no trailer yet either) - the cast was crawling into the book big time, making it all a bit too much.
No plans to read the English version. Experiment failed.
Will absolutely watch the film when it comes out.
Even without watching the film (no film yet) - not even a trailer (no trailer yet either) -
the cast was crawling into the book big time.
But at that point there was enough data in the books experiment to get to some conclusions:
First, things do get lost in translation. Reading fiction in its original language is better, translations sometimes miss nuances of the story. Interestingly, this seems to also depend on the author’s style, like if they tend to casually mention important facts.
Second, some things just sound better in English. Some languages lack certain words, you often cringe by the choice of words in translations, sounding almost too clinical.
Third, the crawling factor is real, but I am starting to suspect it depends on the cast. More specifically, on how authentic and convincing the cast is.
And last, at the current state of tech today, translation of books will likely continue to involve humans, at least for now.
But tech evolves fast. Faster than ever before.
Recent posts:
About me: Real person. Opinions are my own. Not generated by AI. See more here.
About Verge of Singularity.
LinkedIn: https://www.linkedin.com/in/hadas-bitran/
X: @hadasbitran
Instagram: @hadasbitran