To begin to grasp why using object-like visual symbols for words is a good strategy, consider two alternative strategies besides the objects-for-words one.
First, rather than drawing objects for words, we could be lazy and just draw a single contour for each spoken word. Writing “The rain in Spain stays mainly in the plain” would then look something like that shown in Figure 3a. Shorthand is somewhat akin to the lazy approach, with some words having single stroke notations. Shorthand is great for writers with fast-talking bosses, but is notoriously hard to read and has not caught on for writing. Kids also don’t think it’s a good idea - there’s not even a single lone contour in my daughter’s drawing in Figure 1. One reason it’s not a good idea is that there are just not enough distinguishable stroke types for all the words we speak.
Coming up with even 100 easily distinguishable stroke types would be tricky, and that would still be far below the tens of thousands that would be needed for writing.
There is also a more fundamental difficulty, and it has to do with the fact that the part of your brain doing the visual computations is arrayed in a hierarchy.
The earlier stages of the hierarchy deal with simpler parts like contours, higher areas deal with simple combinations of contours, and eventually at the highest regions of the hierarchy full objects are recognized and perceived. The problem with using single strokes to represent spoken words like in Figure 3a is that the visual system finishes processing the strokes far too early in the hierarchy.
Figure 3 Three different strategies for writing, here exemplified as writing the spoken sentence about the rain in Spain. (a) One could use single strokes for every word, leading to writing akin to that shown, short but with words that don’t look object-like. (b) Alternatively, one could use object-like symbols for the meaningful words, as shown (and still use single strokes for “function” words like ‘the,’ ‘in,’ and ‘mainly’). This is the strategy cultures have come to use because it best harnesses our visual system’s object-recognition abilities for reading. (c) A third strategy would be to let spoken words be written with drawings more complex than objects, which is not useful because then the objects our visual systems find are not indicative of anything.
The visual system is not accustomed to word-like (e.g., object-like) interpretations to single strokes. Single strokes are typically not perceived at all, at least not in the sense that they make the list of things we see out there. For example, when you look at Figure 4 you perceive a cube in front of a pyramid. That’s what you consciously notice and carry out judgements upon.
You don’t see the dozen contours in quite the same sense. Nor do you see the many object corners and junctions (intersections of contours). You don’t say, “Hey, look at all those contours and corners in the scene.” Our brains evolved to perceive objects, not object-parts, because objects are the clumps of matter that stay connected over time and are crucial to parsing and making sense of the world.
Our brains naturally look for objects and want to interpret stimuli out there as objects, so using a single stroke for a word (or using a junction for a word) is not something our brains are happy about. Instead, when seeing the stroke-word sentence in Figure 3a the brain will desperately try to see objects in the jumble of strokes, and if it can find one, it will interpret that jumble of strokes in an object-like fashion. But if it did this, it would be interpreting a phrase or whole sentence as an object, something that is not helpful for understanding a sentence: the meaning of a sentence is “true” or “false,” not any single word meaning.
Figure 4 When you see this figure, you see a box partially occluding a pyramid. That is, you see objects. You don’t see 14 contours. You don’t see 12 junctions (places where contours intersect). Writing has culturally evolved to take advantage of what our visual system has evolved over hundreds of millions of years to do: see objects. Writing has evolved so that spoken words tend to look object-like.
Using single strokes as words is, then, a bad idea because the brain is not designed to treat single contours as meaningful. Nor is it designed to treat object junctions as meaningful. That’s why spoken words tend to be written with symbols having a complexity no smaller than visual objects.
How about, instead, letting spoken words be visually symbolized by whole scenes, i.e., via multiple objects rather than just a single one? Figure 3c shows what “The rain in Spain…” might look like with this “scene-ogram” strategy. Quite an eye full. These are akin to the drawings found in some furniture assembly manuals. The problems now are the opposite to those before.
First, the natural meaning of scene-ogram images is more like that of a sentence, like “Take the nail that looks like this, and pound it into the wooden frame that looks like that.” Secondly, the fact that there are objects as part of these complex symbols is itself a problem because now the brain wants to inappropriately make meanings out of these, and yet these objects are now just the building blocks of a written word, having no meaning at all.
In sum, the visual system possesses innate mechanisms for interpreting object-like visual stimuli as objects. Because spoken words are the smallest meaningful entities in spoken language, and often have meanings that are at the object level (either meaning objects, or properties of objects, or actions of objects), it is only natural to have visual representations of them that the visual system has been designed to interpret, and to interpret as objects. By drawing objects for spoken words—and not smaller-than-object visual structures like contours or junctions, and not larger-than-object visual structures like scenes—the visual system is able to be best harnessed for a task it never evolved to do. (See Figure 5.)
Figure 5 The part of the human visual cortex responsible for recognizing objects is organized into a hierarchy, where the lower levels are responsible for recognizing simpler visual features of objects such as edges or strokes, middle levels handle simple contour combinations, and the highest levels are responsible for whole objects. Here there are just three levels, but in reality there are about a dozen. Ovals on each drawing show the level of detail that the area of the brain deals with. Whether real-world objects, semi-symbolic cartoons, visual signs outside of language proper, or symbols in logographic writing like Chinese, the symbols tend to be roughly object-like, just what our visual system evolved for.
Object-like symbols might, then, be a good idea for representing words, but are the object-like symbols we find in culture a result of cultural evolution having selected for this, or might it instead be that they are just a left-over due to the first symbols having been object-like? After all, the first symbols tended to be object-like pictograms, even more object-like than the symbols in Figures 2b and 2c. Perhaps our symbols are still object-like merely because of inheritance, and not because culture has designed them to be easy on the eye. The problem with this argument is that writing tended to change quickly over time, especially as cultures split. If there were no cultural selection pressure to keep symbols looking object-like, then the symbol shapes would have randomly changed over the centuries, and the object-likeness would have tended to become obliterated.
But that’s not what we find. Culture has seen to it that our symbols retain their object-likeness, because that’s what makes us such good readers. It is interesting, though, that even the first symbols were on the right track, before cultural evolution had time to do any shaping of its own. Although, given that even small children codgeon onto this, it’s perhaps not too surprising that the first scribes appreciated the benefits of object-like drawings for words.
The Trouble with Speech Writers
The brain prefers to see objects as the symbols for words, and kids and much of the world have complied. Such writing is “logographic” (symbols for words), and doesn’t give the reader information on how to speak it, which is itself a great benefit, for then even people who speak different languages can utilize the same writing system and be able to communicate via it. That is, logographic writing systems can serve as universal writing systems bringing together a variety of spoken languages into harmony and friendship, Tower-of-Babel style. Japanese speakers, for example, have no idea what a Chinese speaker is saying, but can fairly well understand written Chinese because Japanese speakers also use Chinese writing (which is of the objects-for-words kind).
Brotherhood and peace may be nice, but there er jus some thangs ya cayant do when writin’ with objects. For one thing, you can’t communicate how to say those words. …including putting a person’s accent down on the page. A Japanese person may be glad to be able to read Chinese content, but he will be totally unprepared to actually speak to anyone in China. The kind of writing you’re reading at the moment is entirely different. Rather than symbols for spoken words, the basic symbols are letters saying how to speak the words.
You’re reading “speech-writing.”
Speech-writing allows us to put Tom Sawyer’s accent on paper, and it allows non-speakers of our language to obtain a significant amount of knowledge about how to speak among us by reading at home. Such a learner would have an atrocious accent, of course, but would nevertheless have a great start. A second important advantage to speech-writing is that one can get away with many fewer symbols for writing. Rather than one object-like symbol for each of the tens of thousands of spoken words, one only needs a symbol for each of the dozens of speech sounds, or phonemes, we make. That’s a thousand-fold reduction in the number of written symbols we have to learn.
I have no idea whether the merits of speech-writing outweigh the benefits of logographic (symbols-for-words) writing, but there have been hundreds of speech-writing systems over history, many in use today by about half the world’s population. And when culture decided to go the speech-writing route rather than the logographic route, it created for itself a big dilemma. As we’ve discussed, the best way to harness the natural object-recognition powers of the visual system is to have spoken words look object-like on paper. But in speech-writing the symbols are for speech sounds, and written words will consist of multiple speech sound symbols. How can our written words look like objects if written words no longer have fundamental symbols associated with them?
If symbols are for fundamental speech sounds, then the look of a written word will depend upon the letters in it. That is, the word’s look will be due to the vagaries of how the word sounds when spoken. Had it been spoken differently, the written word would look different. If the look of a word depends on how speakers say the word, it would seem that all hope is lost in trying to make written words look object-like in speech-writing.
There is a way out of the dilemma, however, and although no individual may have conceived of the idea, culture nevertheless eventually evolved to utilize this solution. The solution is this: If written words must be built out of multiple symbols, then to make words look object-like, make the symbols look like object parts. That’s what culture did. Culture dealt with the speech-writer dilemma by designing letters that look like the object parts found in nature, object junctions, in particular. That way written words will typically be object-like, so that again our visual system can be best harnessed for reading.
Because the geometrical shapes of letters vary considerably across fonts (and across individuals), but do not typically much change in their topology (see Figure 6a), a topological notion of shape is the apt one for studying letter shape. It is also apt because the geometrical shape of a conglomeration of contours in a scene changes with the observer’s viewpoint whereas the topological shape will be highly robust to viewpoint modulations. Figure 6b shows three simple kinds of topological shape, or configuration: L, T and X. Each stands for an infinite class of geometrical shapes having the same topology. Two smoothly curved contours make an L if they meet at their tips, a T if one’s tip meets anywhere along the other (except at the tip), and an X if both contours cross each other.
Whereas Ls and Ts commonly occur in the world—as corners and at partial occlusion boundaries as displayed in Figure 6b—Xs do not. And, indeed, Ls and Ts are common, but Xs rare, over the history of human visual signs and nearly a hundred writing systems (see the red squares in Figure 6e). Figure 6c shows four configuration types that are similar in that they each have three contours and two T-junctions. Despite these similarities, they are not all the same when it comes to how commonly they can be found in nature. While three of them can be caused by partial occlusions and are thus fairly common, one of them cannot, and is thus rare in nature. Their commonness over the history of writing also shares this asymmetry, the rare-in-nature configuration also rarely occurring among human visual signs (see the green diamonds in Figure 6e).
Finally, Figure 6d shows five configurations having three strokes that all meet at a single point, or junction, and one can see that some of these require greater coincidental alignments in the world for them to occur, and are accordingly expected to be rarer in nature. And measurements show that writing over history mimics this relative frequency distribution (see the blue circles in Figure 6e).
Commonness in the world drives commonness in writing. Culture appears to have, over centuries, selected for written words that look object-like, thereby harnessing the natural powers of our visual system, allowing us to read with remarkable efficiency.
This is a modified excerpt from my book, The Vision Revolution.
Changizi MA (2009) THE VISION REVOLUTION: How the Latest Research Overturns Everything We Thought We Knew About Human Vision (BenBella Books, Dallas TX).
Changizi MA&Shimojo S (2005) Character complexity and redundancy in writing systems over human history. Proc Roy Soc Lond B 272: 267-275