Banner
    Word Salad And Rules Of Conformity
    By Patrick Lockerby | June 8th 2009 05:02 PM | 3 comments | Print | E-mail | Track Comments
    About Patrick

    Retired engineer, 60+ years young. Computer builder and programmer. Linguist specialising in language acquisition and computational linguistics....

    View Patrick's Profile
    Word Salad And Rules Of Conformity


    The conversation-capable computer has often been said to be just around the corner.  All too often, what is found around the corner is  an impenetrable wall.



    Word salad

    Word salad is a string of words which gives the impression of having been formed by selecting words mostly or entirely at random.  The psychological terms for this include 'verbigeration' and  'schizophasia'. The linguistic term is  'modern poetry'.


    Rules of conformity

    When people communicate, they use the communal protocols of their language.  When a small selection of these communal protocols - the rules of conformity - is studied and written down it is often called a grammar.  In reality, nobody has yet captured all of the rules of conformity of any single language, hence, no written grammar can claim to be a complete natural grammar.

    Anything said, any message which conforms to the natural grammar of a community  will be accepted as a valid message.  The so-called 'truth value' of the idea conveyed by the message is entirely irrelevant to the validation process: "I always tell lies." is a perfectly valid English-language message.

    The first rule of conformity in the case of written English constrains messages to be written using a specific character set.  The character set most commonly found in the sphere of computing is the ASCII character set.   It is a relatively simple matter to program a computer to output a random stream of printable ASCII characters.  However, that output bears no resemblance whatsoever to English, as is shown here.

    eOrE)9iyV[Fc"woRuB=L#|IwB8_U6JmNwCAGqaH+

    Various measures may be adopted to make a stream of characters conform to some of the observed properties of written English.  The choice may be constrained to fewer characters, as e.g. the following set, which includes the space character.

    "abcdefghijklmnopqrstuvwxyz "

    As a next stage, relying on statistics, one might skew the randomness of the choice mechanism to favour character selection according to frequency of occurence in text corpora:

    a inn edifamdu ret repist nep

    Markov chains

    By listing all words in any written text, it is possible to list under each word all those words which occur next-in-line in the text.  By random selection of words from these lists, it is possible to generate an output - a Markov-chain - that vaguely resembles English.  

    Term selection by frequency of occurence, however sophisticated the program, tends to produce wierd results.  Frequency of occurrence cannot be the principal player in the brain's word-concatenation mechanisms.  Although frequency relates to communal use of words, it does not relate to communal word associations.  A frequency based model fails to deal with pragmatics.

    On reading through a Markov-chain text, a human will immediately notice exactly where the chain deviates from validity: there is a discontinuity as prominent to the average reader as is a fault-line to a geologist.  Here are some examples from my own computer:

    Cats always join in getting home without speaking to do the accusation.

    Echoed in poetry said a soothing lullabies.

    Greater fury that lovely sight in heaps at pinning and picking the judge.

    Punishments for refreshments.

    Beheading people's noble health .

    Near these Martians lying clutching tentacles it quite altered very poor savages by constructing another atmosphere and hobbled towards Wimbledon.


    The discontinuities in a Markov chain are, I suggest, a key to understanding how English works.  It seems that the word selection process favours continuity in sentence generation and favours discontinuity in the marking of sentence boundaries.

    If a Markov-chain based program follows a few rules of syntax, and selects from lists of nouns, verbs, etc. then the output, whilst often  appearing strange, becomes a closer approximation to a valid English sentence.  It might seem that one need only add some rules of semantics to this method in order to always generate valid sentences.  

    I suggest that this approach is too restrictive in scope and will never generate the diversity and style of valid output which a program needs in order to converse with humans in a natural-seeming way. 

    The problem lies in the supposition that just because words may be categorised as verbs, nouns and such, it follows that ideas themselves may be so categorised.  From a mental models aspect, the theory does not follow.  I suggest that mental models do not conform to the norms of a syntax-based grammar, with only one possible exception: a formally-learned mental model of a syntax-based grammar!

    If you enjoyed this article, you may also enjoy some related articles in my blog:
    Random Reward Schedules and the Ambiguity of Language.
    Intelligence Made Simple
    A Journey to the Centre of the Universe
    Digging Beneath the Surface of Grammar
    Thinking Machines and The Semantic Quagmire
    The Intelligence in the Chinese Room.

    Comments

    antunes
    Fun article.  Markov chains are very handy for (after training) recognizing something you can't categorize (e.g. sound samples, outcomes of a complex system).  But I agree with you that creating has underlying rules that are known-- not just rules of semantics, but idea frameworks.

    So I used to think the rule-based systems are better for generative text (or music) than purely statistical methods.  But, you got me interested in hunting down such things, and I found a link to Markov-generated Shakespeare and, also, Beatle's lyrics: http://www.cogs.susx.ac.uk/courses/gc/lec08.html

    So, is there a semantics of meaning necessary, or can we just statistically recreate all of Shakespeare with just 1 million Markovian monkeys?
    logicman
    I'm glad you enjoyed it Alex.   Markov chains, aka hidden Markov models do have their uses.  As for language, I think Markov chains are fun, but mostly irrelevant.  For a computer to be able to converse in natural language it needs a blend of syntax, semantics and pragmatics.  I'll be writing more on this, building up a set of articles to show how I think the parts work, and then showing how it all fits together.

    Meanwhile, here is some more fun with Markov chains.
    Fascinating article. Reading those sentences, it's almost like your mind fooled into actively making sense of the sentence until some implicit rule is broken. It seems to highlight the listener of spoken language as well as the speaker.