To Make Better Translation Algorithms, Look To The Bible

Medicine uses Latin because it is a 'dead' language - the meanings of the words will not change over time. But if you want to modernize translations to different languages, an ancient book may help: The Bible.

Tools to translate text between languages are widely available - and rather awful. While they can create literal translations, style is hard to bring across without human intervention. If you tried to read a translation of China's Liu Cixin using a computer, you would miss everything, most importantly a great example of the best science-fiction culture since America of the 1950s.

Big Data can help, but it takes an enormous amount of data to make it possible. That's where the Bible comes in. Each version of the Bible contains more than 31,000 verses and it's in every language, which means it has what they call "a large, previously untapped dataset of aligned parallel text."

Bible photo credit Chris Downer. Composite illustration courtesy of Keith Carlson. Provided by Dartmouth College

Using The Bible, researchers were able to produce over 1.5 million unique pairings of source and target verses from 34 versions of the English-language Biblefor machine-learning training sets. The Bible is also thoroughly indexed by the consistent use of book, chapter and verse numbers. The predictable organization of the text across versions eliminates the risk of alignment errors that could be caused by automatic methods of matching different versions of the same text.

To define "style" for the study, the researchers reference sentence length, the use of passive or active voices, and word choice that could result in texts with varying degrees of simplicity or formality. According to the authors, "Different wording may convey different levels of politeness or familiarity with the reader, display different cultural information about the writer, be easier to understand for certain populations."

The team used 34 stylistically distinct Bible versions ranging in linguistic complexity from the "King James Version" to the "Bible in Basic English." The texts were fed into two algorithms - a statistical machine translation system called "Moses" and a neural network framework commonly used in machine translation, "Seq2Seq."

While different versions of the Bible were used to train the computer code, systems could ultimately be developed that translate the style of any written text for different audiences. As example, a style translator could take an English-language selection from "Moby Dick" and translate it into different versions suitable for young readers, non-native English speakers, or any one of a variety of audiences.

IrishNeanderthal
From TCW Defending Freedom, formerly The Conservative Woman: 31st May: Never forget - it's...

MAHA Report Is A Bridge Too Far Against Farming | Science 2.0 · 5 days ago
Berkshire_Bee
The usual way to drive into the main Reading University campus is lined with young oak trees, the trunks of which are covered with lichens like these:...

The 'Still Explosions' Of Lichens On Stone · 1 week ago
Anonymous Snowboarder Needs Sn
That sounds like an awesome experience. Besides time for observations, how many days are appropriate to visit and not be rushed?

The Night Sky From Atacama · 2 weeks ago
Clay Baggins
Simple answer...No. You should ask, "Does the Perception of Human Caused Global Warming Prompt an Excuse to Go To War." There is no global warming being caused by Co2 generated by humans....

Does Global Warming Cause War? · 2 weeks ago
David Brown
"... countless galaxies of all shapes and brightness ..." Why does Milgrom's MOND seem to correctly model galactic rotation curves? Is MOND essential for understanding the structure...

The Night Sky From Atacama · 3 weeks ago

Related articles

Comments

Know Science And Want To Write?

Donate or Buy SWAG