Without the Universal Translator (UT) we wouldn’t be celebrating the 50th anniversary of Star Trek next year. Who wants to watch a TV show where people can’t communicate with one another and can’t figure out what they have in common? You might as well watch family Thanksgiving dinner videos.
In the Original Series, the UT was a silver cylinder; you can see the Gorn and Kirk with them in the clip. By The Next Generation, they were incorporated into "com badges." In one episode, Riker and Counselor Troi had them as implants. The Ferengi had them in their ear, an apparently Quark’s had to be adjusted with a Phillips screwdriver every once in a while – although that may have been to remove ear wax.
Humans have about 6000 spoken languages on Earth as of March, 2015 – 6001 if you want to include rap. We're in quite the hurry to build translators that would help us understand one another - anything to avoid years of high school classes that lead to stronger brains but also bad foreign names and poor attempts at cooking.
In some ways, our translators have already passed those of Star Trek, but in others ways we're far behind. Most of our problems have to do with understanding just what things all languages have in common and what things are purely cultural, contextual, and completely without precedent. Let’s take a look at our efforts so far.
If you watched Star Trek, you may already realize the way we have surpassed some of their technology. The UTs of Kirk and Picard were for spoken language only. They still had to keep a crew member as a translator to figure out what signs meant on another ship or how to interpret alien consoles. We already have that licked.
Google has one (Google Goggles, now Google Translate for Android), and there’s an app for that on the iPhone/iPad (called Word Lens, from Quest Visual, bought by Google Translate in 2014, see video below).
And of course we have translators for written words – you type in what you want to say, and the software gives you a reasonable (meh) translation. Try translating a phrase in and out of a language several times and see what you end up with – it’s like a multicultural game of telephone operator.
The latest amazements are the vocal translators, but only for languages we have programmed in. Skype translator was introduced in late 2014. You speak in Spanish or English while having a video chat. On the other end, it comes out in English or Spanish. Why? Because that’s the only translation they offer as of now. How? It's based on speech recognition software. It also gives you a written transcript of the conversation so you can post all the hilarious errors on Twitter (like for autocorrect).
It’s in the vocal translation arena that the Star Trek UT excelled. It was so good it that the TV series just accepted that the translator was there, never broke down, and let us hear everything in English. They didn’t even bother making the aliens’ lips (if they had them) move out of synch with the English translation!
In principal, the Federation members would have their new alien acquaintances talk into the translator for a while. The device, using deciphering algorithms and the linguacode matrix (invented by an Enterprise linguist), would learn it and then translate it. This seems hinky to me.
Every time a new word was encountered, it would seem to me that the translator would have to either wait till it heard it enough times to decipher its meaning or extrapolate its meaning from context. Neither of these things could occur in real time. It seems to me that the “talk into it” phase would be very long.
Basically, the hardware of a translator is easy. It’s the software that we have to work on. A 2012 paper presented to the Association for Computational Linguistics (yep, just call ‘em the UT geeks) used statistical models to try and train language programs better.
Up to this point in time, vocabulary has been the choke point in trying to speed deciphering and translation. By using the statistical commonalities of all languages (if they can be found and relied upon), the need for so much vocabulary would be eased.
Any of these real-life software algorithms (or the fictional linguacode matrix) will be based on ideas presented in the 1950’s by American linguist, philosopher, and political activist Noam Chomsky and others.
Ostensibly, the more languages that were encountered, the better the UT would work. On the other hand, maybe there’s not a biologic universality to language, but word order is mimicked in all language – how we build a language is universal.
Either one of these scenarios would make it easier for a computer program to take a completely unknown language and put it through algorithms that might discern order and then meaning.
But a recent study is inconsistent with these ideas. According to a 2011 paper in Nature, word order is based more on historical context within a language family than in some universal constant or similarity. They found that many different sentence part combinations, like verb-object (or object-verb) or preposition-noun (or the reverse) for example, are influenced by other structure pairs within the sentence.
One word preceding the other in some languages caused a reversal in other pairs, while the reverse might be true in other language families. The way that sentence structure via word ordering evolved does not follow an inevitable course – languages aren’t that predictable. Bad news for computer-based word order help.
the 2009 paper, a computer algorithm to predict conditional entropy was used in an effort to investigate a 5000 year old dead language.
The Indus civilization was the largest and most advanced group in the 3000 BCE world. Located in the border region of today’s India and Pakistan, they may have had a written language – we can’t tell. They had pictograph carvings, but what they mean is up in the air. There is no Rosetta stone like we found for ancient Egyptian, and no one speaks or reads the Indus now.
The algorithm for conditional entropy is used to calculate the randomness in a sequence of…. well, anything. Here they wanted to see if there was structure in the markings and drawings. The results suggested that the sequences were most like those in natural languages.
But, just to prove it’s never that simple, linguist Richard Sproat (works for Google now) has contended that the symbols are non-linguistic. In 2014, he did his own larger analysis with several different kinds of non-linguistic symbols, and showed that the Indus pictographs fall into the non-linguistic category.
He rightly points out that computational analyses have a downfall in that biases could enter based on what type of text is selected and what that text depicts. I don’t think someone could pick up English if all they had to study were shopping lists.
But in other old languages, more progress has been made. One paper used a computer program to decipher and translate ancient language of Ugaritic in just a few hours. They made several assumptions, the biggest one being that it had a known language family (Hebrew in this case). This may not be possible when dealing for the first time with some new alien language.
They also assumed that the word order and alphabet usage frequencies would be very similar between the lost language and Hebrew. They then played these assumptions off one another until they came upon a translation. Ugaritic was deciphered by brute human force a while back, but it took many people many years to do it. This is how we know that the computer algorithm got it right – it just took 1/1000 of the time.
But, even if we find universalities in language, the computer won’t be enough. An example comes from Star Trek itself, in an episode of ST:TNG called Darmok. The universal translator told Picard exactly what the aliens were saying, but it didn’t make any sense.
Their language was based on their folklore and history. All their phrases were metaphors of events in their past. So unless the UT knew this species’ particular history, it could only translate the words not the meaning. Language is more than words in an order; language is the collective mind of a group connecting them to each other and to their world.
Next week, deflector shields.
Contributed by Mark E. Lasbury, MS, MSEd, PhD