Patterns of Latin in the Voynich Manuscript

The Voynich manuscript, more properly identified as Beinecke MS 408, has been a puzzle to many researchers for a few hundred years.  The first evidence of attempts to understand it consists of letters between owners and decryption experts.  There have been very many efforts made by linguists and cryptographers since the manuscript was brought to public attention in 1912.
It largely disappeared from public record until 1912 when Wilfrid Voynich, an antique book dealer, bought it amongst a number of second-hand publications in Italy.

Since then, scholars and cryptographers have studied the document but have failed to find meaning in the text.

It was investigated by a team of code breakers during WWII, but they also failed to find meaning in the words.

Academics across the world have been trying to decode the manuscript.

Nic Rigby, BBC, 18 February 2014

I suggest that the failure of efforts to decrypt, decypher or decode the manuscript is due to the simple fact that it was never encrypted in the first place.  Rather, it was written almost entirely using a form of Latin shorthand, or breviographs.  Breviographs were used for centuries as a means to save both time and parchment.  A form of shorthand, writing done with the use of breviographs can be of use in keeping a direct record of speech, as e.g. in a a court case.  The system can also save many sheets of expensive parchment.

What else but latin?

The Voynich manuscript was most likely written in Europe according to many investigators.  At the material time many literate people in Europe could speak, read and write Latin.  If the language beneath the text in the Voynich manuscript is something other than Latin then good reasons must be offered in support of that idea.  Even stronger support is needed for any suggestion that the underlying language is other than a language such as Bohemian, German, French, Italian and of course, Latin.

Voynich water pipes

Patterns of Latin in the Symbols

There are many thousands of manuscripts in existence which demonstrate the use of breviographs.  A problem arises in reading these: different nations, regions and schools used many variants of script and breviographs. 
The abbreviations were not constant but changed from region to region. Scribal abbreviations increased in usage and reached their height in the Carolingian Renaissance (8th to 10th centuries). The most common abbreviations, called notae communes, were used across most of Europe, but others appeared in certain regions. In legal documents, legal abbreviations, called notae juris, appear but also capricious abbreviations, which scribes manufactured ad hoc to avoid repeating names and places in a given document.
source - Wikipedia Scribal abbreviation.
Fortunately for scholars there exists an excellent reference with numerous examples of the symbols used in breviography: The Elements of Abbreviation in Medieval Latin Paleography, by Adriano Cappelli.

In the preface to their translation of the first part of Cappelli, included in the pdf linked above, David Heimann and Richard Kay say this:
Take a foreign language, write it in an unfamiliar script, abbreviating every third word, and you have the compound puzzle that is the medieval Latin manuscript.
I would paraphrase their words to apply more specifically to the Voynich manuscript:
Take a foreign language, write it in an unfamiliar script, abbreviating every common sequence of letters, and you have the compound puzzle that is the medieval Voynich manuscript.

From the above-mentioned translation:
Take a foreign language, write it in an unfamiliar script, abbreviating every third word, and you have the compound puzzle that is the medieval Latin manuscript. For over two generations, paleographers have taken as their vade mecum in the decipherment of this abbreviated Latin the Lexicon abbreviaturarum compiled by Adriano Cappelli for the series "Manuali Hoepli" in 1899.

0.2 All medieval abbreviations, for both Latin and Italian words, can be divided into six categories, each of which will be treated in turn. Abbreviation can be indicated by:
1. Truncation,
2. Contraction,
3. Abbreviation marks significant in themselves,
4. Abbreviation marks significant in context,
5. Superscript letters,
6. Conventional signs.


6.1 This category includes all those signs, for the most part not recognizable as letters and almost always isolated, that stand for a frequently used word or phrase.

6.2 Among the abbreviation marks that are significant in themselves, we noted that the signs 9 and ) mean con or cum even when they stand alone (§3.2);
The images below, in this part, are taken from Cappelli with  Voynich graphics added to show similarities.

Voynich EVA y

Repurposed Breviographs

There is much evidence to suggest that the Voynich manuscript is based in Latin.  Although it is frequently said to be written in an unknown script, many of the symbols are in fact well known to scholars of medieval Latin.  That said, the set of Voynich breviographs does not conform to any known national, regional or scholarly sub-set of symbols. I suggest that the writer of the Voynich manuscript has selected a set of symbols and applied rules of use so as to create a document which was sight-readable by the writer and by anyone who had learned the symbols and the simple rules.
... the book—called the Voynich manuscript after the rare-book dealer who stumbled upon it a century ago—is written in an unknown script, with an alphabet that appears nowhere other than in its pages.
New Yorker

Voynich EVA c h a

The use of breviographs declined and then virtually died out with the advent of relatively cheap paper and the use of the printing press.  Recall that parchment was relatively expensive, so it was logical to use any means to cram more words into less parchment.  This was a medieval equivalent of file compression. It appears highly likely  that about a hundred years after the Voynich manuscript was written, and almost a hundred years into common use of paper, nobody living was familiar with breviography to such an extent that they could make sense of a document written almost entirely with the use of breviographs.

Voynich EVA p and f

Medieval spelling variations.

For centuries the Latin language developed naturally as it was adapted to the needs of its speakers and writers.  As the Middle Ages merged into the Renaissance and written Latin came to give way to local languages there was a rise in the teaching of a formal and scholarly Latin.  The rise of printing also gave rise to a greater uniformity of spelling, as I noted in a previous article, cited below.  In the case of Latin, the "wrong" medieval uses of Latin were mainly lost as the "correct" forms were promulgated in the schools and universities.

A generation of schoolboys being punished for using 'vulgarisms' was enough to establish Cicero as the source of all things Latin. If anybody wanted to study grammar, they studied Cicero. Rhetoric? Cicero. Examples of quality literary prose? Cicero. And so more and more Latin grammars came to contain only words from Cicero, phrases from Cicero, patterns from Cicero. A language confined into too small a space suffocated and died. In England, a land where Latin once flourished, it died out, coming to be found only in dead books written by long dead hands.

" ... all barbary, all corruption, all Latin adulterate, which ignorant blind fools brought into this world, and with the same hath distained and poisoned the old Latin speech, and the veray Roman tongue which in the time of Sallust and Virgil was used — I say that filthiness and all such abusion, which the later blind world brought in, which more rather may be called Bloterature than Literature, I utterly banish and exclude out of this school."
John Colet (January 1467 – 10 September 1519) - text modernised.

exerpt from A Brief History of the English Language Part 4

The Voynich manuscript was written just before the rise of "correct" spelling, so it is quite likely that the writer used variant spellings throughout.  It is well known to scholars that Shakespeare's spelling varied quite a bit  - to say nothing of his bad grammar.

Spelling variation and the voynich manuscript

In the past I have argued that we must take account of spelling variation when we study the Voynich manuscript. Many people fail to realise how common it was for mediaeval scribes to use a variety of different spellings even for the same words on the same line. Standardised spelling conventions are a modern obsession which we mustn’t apply to the Voynich.

Stephen Bax

Voynich EVA various

Patterns of Latin in Frequency Analyses

Many statistical analyses of the Voynich manuscript show that it is likely that there is a real language underlying the symbols.  Word and symbol frequency Zipf's law distributions calculated for the Voynich manuscript conform to those for natural languages.

The method adopted for my own researches in this area compares word-final letters in natural languages with word-final symbols in the Voynich manuscript.  The graph below compares word-final letters in six European languages with the Voynich ms.

Word-final letter frequencies compared.

As can be seen, the patterns are distinctive for each language but are broadly similar.   This, I suggest, is a further demonstration that a natural language underlies the Voynich manuscript.  The most frequent letters are, as may be expected, different for each language which reflects the different word-ending rules in different languages.

Most frequent word-final letters

The next chart compares the frequencies of occurence of word-final letters in 12 different Latin texts, one sample made of three texts with footnotes and Roman numerals omitted, and a large text made of all 13 samples combined.  All letters were converted to lower case.  The book list at the foot of this article constitutes a key to the abbreviated book titles.

Word-final letter frequencies in Latin texts.

The next image shows the data used to produce the graph. 

Data for word-final letter frequencies in Latin texts.

It is a striking feature of the Voynich data that the 2nd, 3rd and 4th data entries vary little.  The values 14.911, 14.576 and 13.949 plot as a dog-leg in the graph.  I suggest that this demonstrates a shared final letter between two of these symbols.  The way these symbols have been used as recorded in Cappelli suggests the symbols expand to 2 or more letters with final letters as follows.  The symbols are transcribed as EVA letters.  y has final letter m (um), l has final e (que), n and r both have final s.

A part of a graph is annotated with the Voynich symbols below.

Voynich symbols in a frequency plot.

 The next graph shows the word-final letter frequencies for a sample of Latin, the unmodified EVA Voynich transcription, the EVA with frequencies for EVA n and r combined and the EVA with n and r, l and e combined.  The data for the next chart is also given below.

Frequency distributions with symbol expansion final letters assumed and combined

Data for the above chart.

The assumption that the symbols are Latin breviographs explains the dog-leg graph.  The combination of presumed -s symbol frequencies, EVA n and r, produces a graph which more closely matches a sample of Latin word-final letter frequencies.

The most frequent word-final letter is m if EVA y is read as -um.  The unusually high frequency of final -um is explained if the writer used the specific sub-set of Latin grammar indicated by final -um.  For information about Latin word-final letters and suffixes, please see;
An Analytical Directory of the Latin Endings, July 2007, Thomas Nelson Winter, University of Nebraska-Lincoln


The Voynich manuscript was most likely produced somewhere in Europe.  Many analyses of the features of its unusual script suggest that a natural language underlies the script.  A document produced in medieval Europe is highly likely to be written in a European language.  My researches into word-final letter frequencies show that not only is the underlying language natural, but also show that the language is most likely Latin.

The texts used in these analyses

Voynich EVA transcriptions by Takeshi Takahashi.

Texts from Project Gutenberg

English  -  Treasure Island, Robert Louis Stevenson.

French  -  Contes de la Montagne, Erckmann-Chatrian

German  - 2 texts combined

        Der Mann des Schicksals, Bernard Shaw (Uebersetztung von Siegfried Trabitsch)

        Die Ahnfrau, Franz Grillparzer

Italian  -  Lezioni e Racconti per i bambini, Ida Baccini

Spanish  -  La Mejor Cocinera pará Calleja

Latin  - key to abbreviated titles.

    medicae -  ENCOMIUM ARTIS MEDICÆ Desiderio Erasmo Roterodamo
    perseo  -  Latin part only, converted to unaccented text, RITCHIE'S FABULAE FACILES
                        JOHN COPELAND KIRTLAND
    flaccus -  Latin part only, The Satires of A. Persius Flaccus by Persius
    daniel  -  Latin Vulgate, Daniel: Prophetia Danielis
    esther  -  Latin Vulgate, Esther: Liber Esther

Nb: all texts downloaded from Project Gutenberg were stripped of Gutenberg notices and transcriber's notes.