Without the Universal
Translator (UT) we wouldn’t be celebrating the 50th anniversary
of Star Trek next year. Who wants to watch a TV show where people can’t
communicate with one another and can’t figure out what they have in common? You
might as well watch family Thanksgiving dinner videos.
In the Original Series, the UT was a silver cylinder; you
can see the Gorn and Kirk with them in the clip. By The Next Generation, they
were incorporated into "com badges." In one episode, Riker and Counselor Troi had
them as implants. The Ferengi had them in their ear, an apparently Quark’s had
to be adjusted with a Phillips screwdriver every once in a while – although that may
have been to remove ear wax.
Humans have about 6000 spoken languages on Earth as of March,
2015 – 6001 if you want to include rap. We're in quite the hurry to build
translators that would help us understand one another - anything to avoid years of high
school classes that lead to stronger brains but also bad foreign names and poor
attempts at cooking.
In some ways, our translators have already passed those of Star
Trek, but in others ways we're far behind. Most of our problems have to do
with understanding just what things all languages have in common and what things
are purely cultural, contextual, and completely without precedent. Let’s take a
look at our efforts so far.
If you watched Star Trek, you may already realize the way we
have surpassed some of their technology. The UTs of Kirk and Picard were for
spoken language only. They still had to keep a crew member as a translator to
figure out what signs meant on another ship or how to interpret alien consoles.
We already have that licked.
Google has one (Google Goggles, now Google Translate for
Android), and there’s an app for that on the iPhone/iPad (called Word Lens,
from Quest Visual, bought by Google Translate in 2014, see video below).
And of course we have translators for written words – you
type in what you want to say, and the software gives you a reasonable (meh)
translation. Try translating a phrase in and out of a language several times
and see what you end up with – it’s like a multicultural game of telephone
operator.
The latest amazements are the vocal translators, but only for
languages we have programmed in. Skype
translator was introduced in late 2014. You speak in Spanish or English
while having a video chat. On the other end, it comes out in English or
Spanish. Why? Because that’s the only translation they offer as of now. How? It's based on speech recognition software. It also gives you a written transcript of
the conversation so you can post all the hilarious errors on Twitter (like for
autocorrect).
It’s in the vocal translation arena that the Star Trek UT
excelled. It was so good it that the TV series just accepted that the translator was there, never broke down, and let us hear everything in English.
They didn’t even bother making the aliens’ lips (if they had them) move out of
synch with the English translation!
In principal, the Federation members would have their new alien
acquaintances talk into the translator for a while. The device, using deciphering
algorithms and the linguacode matrix (invented
by an Enterprise linguist), would learn it and then translate it. This seems hinky to me.
Every time a new word was encountered, it would seem to me
that the translator would have to either wait till it heard it enough times to
decipher its meaning or extrapolate its meaning from context. Neither of these things
could occur in real time. It seems to me that the “talk into it” phase would be
very long.
Basically, the hardware of a translator is easy. It’s the
software that we have to work on. A 2012 paper presented to the Association for Computational Linguistics
(yep, just call ‘em the UT geeks) used statistical models to try and train
language programs better.
Up to this point in time, vocabulary has been the choke point
in trying to speed deciphering and translation. By using the statistical
commonalities of all languages (if they can be found and relied upon), the need
for so much vocabulary would be eased.
Any of these real-life software algorithms (or the fictional
linguacode matrix) will be based on ideas presented in the 1950’s by American
linguist, philosopher, and political activist Noam Chomsky and others.
Ostensibly, the more languages that were encountered, the
better the UT would work. On the other hand, maybe there’s not a biologic
universality to language, but word order
is mimicked in all language – how we
build a language is universal.
Either one of these scenarios would make it easier for a
computer program to take a completely unknown language and put it through
algorithms that might discern order and then meaning.
But a recent study is inconsistent with these ideas.
According to a 2011 paper in Nature, word order is based more on historical
context within a language family than in some universal constant or similarity.
They found that many different sentence part combinations, like verb-object (or
object-verb) or preposition-noun (or the reverse) for example, are influenced
by other structure pairs within the sentence.
One word preceding the other in some languages caused a reversal
in other pairs, while the reverse might be true in other language families. The way that sentence structure via word
ordering evolved does not follow an inevitable course – languages aren’t that
predictable. Bad news for computer-based word order help.
the 2009 paper, a computer algorithm to predict conditional
entropy was used in an effort to investigate a 5000 year old dead language.
The Indus
civilization was the largest and most advanced group in the 3000 BCE world.
Located in the border region of today’s India and Pakistan, they may have had a
written language – we can’t tell. They had pictograph carvings, but what they
mean is up in the air. There is no Rosetta stone like we found for ancient Egyptian, and no one
speaks or reads the Indus now.
The algorithm for conditional entropy is used to calculate
the randomness in a sequence of…. well, anything. Here they wanted to see if
there was structure in the markings and drawings. The results suggested that
the sequences were most like those in natural languages.
But, just to prove it’s never that simple, linguist Richard
Sproat (works for Google now) has contended that the symbols are non-linguistic. In 2014, he
did his own larger analysis with several different kinds of non-linguistic
symbols, and showed that the Indus pictographs fall into the non-linguistic
category.
He rightly points out that computational analyses have a
downfall in that biases could enter based on what type of text is
selected and what that text depicts. I don’t think someone could pick up
English if all they had to study were shopping lists.
But in other old languages, more progress has been made. One paper used
a computer program to decipher and translate ancient language of Ugaritic in
just a few hours. They made several assumptions, the biggest one being that it
had a known language family (Hebrew in this case). This may not be possible
when dealing for the first time with some new alien language.
They also assumed that the word order and
alphabet usage frequencies would be very similar between the lost language and
Hebrew. They then played these assumptions off one another until they came upon
a translation. Ugaritic was deciphered by brute human force a while back, but
it took many people many years to do it. This is how we know that the computer
algorithm got it right – it just took 1/1000 of the time.
But, even if we find universalities in language, the
computer won’t be enough. An example comes from Star Trek itself, in an episode
of ST:TNG called Darmok. The
universal translator told Picard exactly what the aliens were saying, but it
didn’t make any sense.
Their language was based on their folklore and history. All their
phrases were metaphors of events in their past. So unless the UT knew this
species’ particular history, it could only translate the words not the meaning. Language is more than words in an order; language
is the collective mind of a group connecting them to each other and to their world.
Next week, deflector shields.
Contributed by Mark E. Lasbury, MS, MSEd, PhD
Nice article. The possibility of such a UT is even more improbable by the reasonable assumption that humans and ETs have fundamentally different brains (if you can really speak of brains at all), because they don't have any evolutionary history in common.
ReplyDeleteI hope that somebody is going to make a science fiction series were they consult scientists to create a more accurate prediction of the future. Especially in the area of life science star trek fails.