To the historian, English is a fascinating language.  Unlike most of the languages of Europe, it underwent an almost complete makeover following the Norman invasion (1066 and All That).  As a result, although or basic words and grammar are basically like German and especially Dutch, the lion’s share of our vocabulary is from French and Latin.  

However, if you read History in English Words by Owen Barfield, you will be taken back to a time before English, when a group of warrior tribes spread out over Europe and North India, spreading their languages even more effectively than their genes.  After all, a ‘native’ Englishman does not look that much like a Maratha from Mumbai.

In 1786, Sir William Jones published The Sanscrit Language, in which he suggested that Sanskrit, Greek and Latin had a common root, and that indeed they may all be further related, in turn, to Gothic and the Celtic languages, as well as to Persian.  You may be surprised to see the last of these in this list, as is now written in an extended Arabic alphabet, but when you compare

  • pidar – father

  • madar – mother

  • baradar – brother

  • dukhtar – daughter

(not doctor, as some mistake it when I play the guessing game with them) the relationship becomes apparent.

One set of words that has been preserved well in this evolution is the numerals, especially 2 to 10.  Indeed, if one counts up to 10 in Welsh, it might look as if the numerals had almost been cobbled together from Greek and Persian.  But which other words have survived from these ancient times, and which are newcomers, replacements for earlier ones?  (I’m not referring to modern words like “television” which is compounded from Greek and Latin.)

Enter Statistics, as practised by Mark Pagel, whose working area covers “Evolution, computational biology, language evolution, phylogeny, Markov chain Monte Carlo”.  Do I hear groans and sighs over the use of mathematics in biology?  Especially since the financial crisis that has come upon us, partly powered by mathematical monkeying with models originating in the Jet Propulsion Laboratory?  Fear not! To someone in a physics department, probability theory may owe its roots to gambling, but the very name of the science of statistics (things of the state) shows that it developed in relation to that most ornery of critters, self-styled Homo sapiens.  

In recent work [1], the group have compared words across 87 Indo-European languages, and classified them according to frequency of meaning-use in English, Spanish, Russian and Greek.  They have then derived rates of lexical replacement by comparing 87 languages, and shown that frequent meaning-use is strongly correlated with stability.  Besides the numerals, the highest stability is found in words like “who, what, where, when” and “not”.  These may sound quite different in different languages, but that is phonetic evolution, not lexical.  As an example of a phonetic change, in England now only a small proportion of people, myself included, still pronounce the “h” in “what”, while to the majority this word is homophonous with “Watt”.  Lexical instability, though, is strongly correlated with word type, and it may surprise you to read that one word that we may soon lose from the English language is “dirty”!

Which brings us to the title.  Many of you may recognize the title of this article as a hark-back to the Shirelles.  At this point, I recommend the more serious-minded among you to go to the Reading University press release Scientists discover oldest words in the English language and predict which ones are likely to disappear in the future.  But for those who like a bit of craziness, here is my take on the “title” song:

Do androids suffer pain or sorrow?
Maybe from humans they must borrow;
Tonight’s the night the robots come to fight,
Lots of scrap me-e-tal tomorrow.


[1] Mark Pagel, Quentin D. Atkinson  &  Andrew Meade
Frequency of word-use predicts rates of lexical evolution throughout Indo-European history
Nature 449, 717-720 (11 October 2007)