A new translation technique being created by Google does not rely on versions of the same document in different languages, the old dictionary approach. Instead, it uses data mining techniques to model the structure of a single language and then compares this to the structure of another language. The new approach relies on the notion that every language must describe a similar set of ideas, so the words that do this must also be similar. For example, most languages will have words for common animals such as cat, dog, cow and so on. And these words are probably used in the same way in sentences such as “a cat is an animal that is smaller than a dog.”
The same is true of numbers. The image above shows the vector representations of the numbers one to five in English and Spanish and demonstrates how similar they are. The set of all the relationships, the so-called “language space”, can be thought of as a set of vectors that each point from one word to another. And in recent years, linguists have discovered that it is possible to handle these vectors mathematically. For example, the operation ‘king’ – ‘man’ + ‘woman’ results in a vector that is similar to ‘queen’.
Citation: Tomas Mikolov, Quoc V. Le, Ilya Sutskever, 'Exploiting Similarities among Languages for Machine Translation', arXiv:1309.4168
Link: How Google Converted Language Translation Into a Problem of Vector Space Mathematics - Technology Review