By DAVID VERGUN, Army News Service
ADELPHI, Md. – Being able to converse with people who don’t speak English is essential for the Army, since every day, Soldiers are partnering with militaries in dozens of countries around the world.
A number of speech-translator devices are available commercially, and Soldiers have been using them. However, speech translators are seldom completely accurate and problems can arise in places where the population converses in a dialect, a form of a language that is specific to that region, according to Dr. Steve LaRocca, a team leader at the Multilingual Computing and Analysis Branch at the Army Research Laboratory.
One especially large problem area identified by the Army is the continent of Africa, where dialects of French are spoken in 21 countries. These are countries that Soldiers from U.S. Army Africa often visit.
When a Soldier speaks English into a translator device, it comes out in standard French, spoken in Paris, which speakers of French dialects can usually comprehend, LaRocca said. The problem arises when the Soldier must translate the French dialect into English using the translator device, because the device has trouble understanding the unique accents, vocabulary, and grammar of the dialect.
Producing speech-translator devices that understand dialects wasn’t commercially feasible for the private sector, so LaRocca and his team were tasked by the Army with producing algorithms that can give translator devices the ability to comprehend dialects of French and translate them into English.
The vendor of a speech translator device allowed LaRocca’s team to unlock the device in order to add an algorithm overlay onto the standard French model that would process the dialect, LaRocca said.
How the algorithm works
First, LaRocca and some of his team traveled to Africa, where they collected male and female samples of speech from natives of Cameroon, Gabon, Chad and the Congo. They were especially intent on collecting speech involving the use of medical terms and tactical language useful to militaries.
Once the speech data was collected from 50 to 100 speakers of each dialect, “two bits of magic” were then applied, LaRocca said.
First, the “signal,” meaning the recorded speech, was parsed into frequency bands to find the energy peaks that characterize the vowels and consonants of the accent. Adapting for accented speech adds a ripple or two to the curves for better models, LaRocca said.
The second bit of magic, he said is producing a “fuzzy match” of those peaks and valleys of energy and applying that to a model that can predict what the person has just said based on those energy patterns.
It’s fuzzy, he said, because no two individuals within the same dialect pronounce words exactly the same, so a precise match of every word spoken by every individual is not feasible. The goal was to get as close as possible using statistics and probabilities.
Just five or six years ago, producing the algorithm wouldn’t have been possible, he said. It is only possible today thanks to recent breakthroughs in “deep neural learning,” a process that combines the fields of mathematics, computer science and natural language processing.
John Morgan, a mathematician on the team, does the natural language processing and builds the algorithms. He said most of the code he writes is borrowed from others and arranged in a way that makes the language recognition model work. He added a couple hundred lines of his own code to the mix to tease out the results they were hoping for.
The big rollout
According to LaRocca, the timeline for the enhanced speech recognition device is as follows: The commercial vendor will unlock the device before year’s end so the algorithm can be overlayed and the device tested. The fielding of the device is expected to be very rapid, starting as early as next spring. Once fielded, the team predicts the device will have a 20 percent decrease in word error rates over the current standard French device.
As for the distant future, LaRocca predicted that new algorithms will be written to detect stress in people’s voices and even determine whether or not a person is telling the truth. He believes the voice translator device will one day take the form of wearable technology, instead of the bulky cell phone-like device it is today.
Arabic will be next
Jamal Laoudi, a senior linguistics analyst on the team who speaks French, Arabic and English, said plans are already being made by his team to produce a similar device that can translate the many dialects of Arabic. In fact, voice data is already being collected from Tunisia.
Arabic is an even bigger challenge than French dialects, he said, because while French speakers can understand each other throughout Africa, Arabic spoken in nations across North Africa and the Middle East are so different that people from one country sometimes cannot understand the language spoken by someone from another.
And, while there is standard French, spoken in Paris, the so-called “modern standard Arabic” is spoken in no Arab-speaking country, he said, comparing it to Latin, which is the root language of French, Spanish, Portuguese, Italian and other languages.
Why so many Arabic dialects? Besides the vast distances that separate Arabic-speaking peoples and the influence of non-Arabic-speaking indigenous peoples, the many dialects are also a legacy of the colonial era, when different colonial powers controlled different Arabic-speaking countries, he said. Foreign words and speech patterns crept in.