Very rarely do technology releases “break the internet” in the way that popular culture can (what color was that dress again?). But the release of ChatGPT just 4 months ago has thrust itself into the collective consciousness so quickly and deeply that it already feels like we’ve been using “generative” and “AI” together for a long time.
The achievements of the team at OpenAI, both in the Large Language Model (LLM) based technology and in how quickly it took off, are truly remarkable.
The artificial intelligence technology does a great job of providing smooth and accurate answers to any question on various topics. The user interface is user-friendly and accessible to everyone, contributing to the popularity of this technology.
Now, everyone is trying to figure out how they can put this new set of “superpowers” to work to boost productivity and efficiency in all industries and walks of life, but especially when it comes to website translation.
That said language models are nothing particularly new in the world of Natural Language Processing. ChatGPT was created using a special model called "transformer." It uses a deep neural network with encoder/decoder architecture and self-attention mechanisms.
Google switched from statistical machine translation to neural machine translation (NMT) in 2016. This change improved the quality of machine translation and is still used to translate websites today.
NMTs and LLMs are models that generate responses. However, they are trained differently and handle different types of prompts and responses. They also use different amounts of training data.
LLMs are versatile tools, similar to Swiss Army knives and duct tape. However, when it comes to hammering nails into walls, a hammer is the most suitable option. NMT is a hammer for translation - a task-specific tool built to do one thing really well.
Compared to LLMs, NMTs are able to produce the best quality of translation across all languages, especially when looking to incorporate specific brand voice requirements.
The LLM tech is very helpful for translation and localization tasks. It can be trained with data to improve the quality of NMTs’ output. Essentially, the “Swiss Army knife” can be used to build a better hammer).
Building Better Hammers
The process of building a high quality NMT engine has two main stages:
- Establish a Base "Generic" model for the language pair. Either building and training from scratch or using a pre-trained model.
- Domain-adaptation. Fine-tuning the “Generic” model with more specific training data to make it perform better within a given domain (such as adopting the vernacular of an industry segment of the brand voice of a specific company).
Domain adaptation greatly improves translation quality. This is especially true when considering factors beyond grammar and meaning. These factors include adherence to industry or company-specific style guides and glossaries.
While LLMs like ChatGPT can do translation at a quality level that is comparable to Generic NMT models (e.g. Google Translate), at least for high resource languages, the more relevant compare for most organizations is to a well-trained domain adapted NMT model, which will produce much better translation quality, much faster and much cheaper.
This is because the model size (number of parameters in the neural net) and volume of training data that needs to be prepped and then used, is all so much larger in an LLM that the computational cost of both ongoing training and of inference (doing a translation) are orders of magnitude higher than with NMTs. This is also why they are much slower than NMTs.
Where LLM models are helpful is in data augmentation-generating synthetic training data to add to real training data you have curated for the purpose of training NMTs. This is most useful in medium to low resource languages for which it is hard to source enough aligned sentence pairs to train an effective NMT. The LLM may have enough knowledge of the target language to be able to generate synthetic data to augment your real data such that an NMT trained with the augmented data produces higher quality results than one trained with just your real data.
Similarly, it can be used to generate synthetic data for domain-adaptation training where you cannot curate enough real data to do domain-adaptation training that is effective in improving quality. An LLM, well prompted with enough examples from your real data, can produce more data that is different but similar, which is useful.
At MotionPoint, our NMT team has been training brand-adapted NMT models for our customers for some time now and are observing very significant increases in the quality of output from our brand-adapted models in comparison with the generic models.
- We have seen BLEU score improvements from Generic models scoring 30-40 range (intelligible but not great) to 70-80 range (essentially human quality) in high resource languages. While the scores for both generic and domain adapted NMT are lower in lower resource languages, the improvement is still approximately doubling the BLEU score.
- Linguist assessment was that the domain-adapted models produced significantly improved consistency in formality of tone, improved readability, and significantly decreased level of effort for human post-editing. And in fact, in many cases achieved human quality.
Download our FREE guide to machine vs. human translation
The progress in NMT, driven by GPT, is expected to make machine translation more common. As a result, there will be a reduced demand for linguists and translation costs will decrease significantly. This should also make translation much cheaper.
But it’s not that simple.
Releasing the Hounds
A good low-risk way to begin reducing machine translation costs is moving from Human Translation to Machine Translation with Human Post-Editing (MTPE). It produces good results at a lower price because the NMT starting point helps the linguist work faster.
Much more challenging is harnessing the potential of using Machine Translation (MT) as-is without the post-edit. This is where the much higher potential cost savings are seemingly within reach as well-trained domain-adapted NMTs are demonstrably capable of producing translations that are perfectly good enough to use as-is without human post-edit, in many cases and for many contexts.
But this potential remains largely untapped in large organizations because NMT translation quality is fundamentally variable. Even if it produces high quality translations eight times out of ten, the other two matter and require a smart approach to managing the cost-risk dynamic.
One way to benefit from machine-only workflows while containing risk is to send the most quality-sensitive content for MTPE, and the less quality-sensitive content for MT-only methods.
There are several problems with this though. Large organizations, especially in finance and healthcare, cannot make any translation mistakes on their website. This is because they have business, legal, and regulatory concerns.
And in non-website content, making corrections after the content has left the stable is much harder than with website content. Further, even if you can tolerate mistakes getting out into the wild, you still have two problems with the split workflow approach.
- For the quality-sensitive translations being put through MTPE workflow, some of the machine translations will be perfectly good enough to use as-is without post-editing and in those cases, you are paying for a human post-edit you don't need. This is fundamentally wasteful.
- For the rest going through MT only workflow, you carry some risk of unacceptably low-quality translations being published which is potentially reputationally harmful, even if relatively few people see it. In today's world, one person sees it, memes it on twitter, and your brand is a laughingstock.
There has to be a better way to eliminate waste, manage the risk, and fully release the potential of machine translation to materially reduce costs.
Dynamic Workflow
The better approach to this is going to be to put all translations through a single dynamic workflow designed to deliver a translation of at least the required quality at the lowest cost possible. This would consist of:
- Determining the minimum quality required based on the content itself and/or the context it is presented and accessed in.
- Doing the best possible machine translation.
- Doing a rapid and cheap quality assessment of the machine translation.
- If, and only if, the quality assessment is below the required quality sending it for human post-edit.
This approach maximizes the cost advantage of harnessing good machine translation to minimize human post-editing costs, and fully mitigates the risk of bad machine translation, for all content types in all contexts.
Learn how to get predictable translation pricing
The first problem to solve here is the codification of quality for the purposes of steps 1, 3 and 4. Several industry standards around this exist (MQM, DQF, etc.) but to some extent the scoring needs to be adaptable to the specific sensitivities of the organization or industry vertical and the market/language.
Determining what level of quality requirement to apply to each translation task can be relatively simple as in the split workflow example previously given. That was deciding between two possible classifications: do MTPE vs do MT based on whether the content is in a high traffic part of the site or not.
But in the dynamic workflow world, any piece of content in a given context could be tagged with any minimum quality requirement on a sliding scale like 0-100, and far more nuanced approaches could then be taken that consider many other variables such as user behavioral analytics or other signals that provide context for how business critical the quality of the translation really is. This could get highly tuned to optimize for translation cost vs business outcome, as opposed to just translation cost vs translation quality.
The quality evaluation and routing decisions could in principle be done by linguists but to do this cost-effectively on a large volume of translations is another good use case for applying AI classification technology both to the problem of evaluating quality, and to assessing whether it is “good enough”.
There are clearly several technical challenges here and this is not a solved problem in the industry today. But it is a solvable problem, which is why at MotionPoint we have invested heavily in our R&D to develop Adaptive Translation™, including a superior NMT working in concert with AI capabilities for Translation Quality Evaluation and Dynamic Workflow Routing.
Our goal is to enable our customers to fully realize the cost-saving potential of AI for website translation, stretch their localization budgets to supporting additional markets, and deliver more business value than ever before.
Learn more about the future of website translation in light of AI from our recent webinar. Download it for free here.
Last updated on April 14, 2023