Banner
    Why My Computer Didn't Do (All Of) My Homework
    By Samuel Kenyon | September 11th 2012 03:29 AM | 16 comments | Print | E-mail | Track Comments
    About Samuel

    Software engineer, AI researcher, user experience (UX) designer, actor, writer, atheist transhumanist. My blog will attempt to synthesize concepts...

    View Samuel's Profile
    In my innocent early teenage years of computer programming I started dabbling in Artificial Intelligence (AI).

    I wanted my computer to write stuff in English. I'm not sure why, but I think the main triggers were:

    1. I found a BASIC version of ELIZA, the "THE PSYCHOANALYTIC CONVERSATIONALIST". ELIZA is a chatbot originally created in the 1960s (possibly the first chatbot ever).

    2. I happened to flip through a book title Language Development [1] which described various grammars of natural language. In the new light of programming, the sight of symbolic grammar constituents in tree diagrams immediately made me think--hey, I could program that!

    Your Mom


    The first thing I did was make a new version of ELIZA called Ernie.


    Ernie would swear, insult your mother, and generally piss you off.





    In the interest of education, I took the liberty of installing it on some of my high school's computers. Later I augmented it with a program I wrote called Expletive Generator.

    I also tried to make programs to auto generate prose, perhaps in the hope that this would somehow do my homework for me. Homework in middle school and high school was generally annoying to me as it was usually a rude distraction from my main interests.

    I also experimented with computer generated poetry. You occasionally got mildly provocative drivel as a result of the uncompromising creativity of random combinations. Here is an example:
    Flow procreate
    choose Devil late
    sun red peaches mate
    waiting would late
    race who hate
    right song have cheat
    saw trust form great
    behind damn high.

    I combined a few of them together, printed the resulting poem out, and submitted it in a high school creative writing class. The teacher's response:
    I can't say I "get" it but it sounds pretty cool.

    This is AI?


    At first, I didn't realize I was coding AI. My notion of "AI" was from science fiction.


    image from Short Circuit (1986)

    But I happened upon two AI textbooks at the twn dump which I brought home. One was Winston [2], and the other was Charniak and McDermott [3]. And no, they did not convert me to LISP. In fact, to my teenage mind they were a bit dry--the most exciting part was the term "artificial intelligence." The main influence those books had on me was to recognize that AI was a real field of endeavor and that natural language parsing and generation was part of it.

    Where is the Meaning?

    Now, why would it surprise me that natural language generation was part of the field of AI?

    I think the reason is that at first, language understanding and generation seems like it can be done easily in one's mind, but then when you start writing programs you realize it's not that easy.

    They other reason I might have been surprised is that the generation of text seems like a peripheral activity. It's not even the real "intelligence," it's merely an interface to world. Of course, nowadays, I realize how important interfaces and the environment are to mental phenomena. And of course, there never is a "center" to the mind--eventually the homunculus must be slayed.

    But, why would computer code to deal with natural language seem like an unwelcome cousin to the the AI party? Well, the main reason is the lack of meaning.

    One might say a dictionary has meanings, but it doesn't really. It's just a tool that allows an organism such as a human to understand one word in terms of others. Imagine if you kept following definitions forever, never reaching the final definition that made sense to you. That's kind of how most computer programs are--they never reach the final destination of meaning.

    Symbols seduce you into thinking anything is possible, and just around the corner. But symbols are a double edged sword. Or, perhaps a better metaphor is that of an inbred family. Whatever metaphor you like, the bottom line is that amateur symbol juggling won't magically create meaning.

    Words, symbols, definitions, concepts...they lead to more symbols, but eventually they have to hit "ground". Us humans are "grounded" through our biological evolution as animals with bodies interacting in an environment. But most, if not all, computer programs do not have any kind of human-like grounding.

    The Bastard Children

    My programs were strange bastard children of context free grammars and Mad Libs. After about two years of writing these programs during high school, I stopped working on them. Something was missing, and that was meaning.

    Note that creativity was not missing. Psuedo-random combinations proved to be quite creative.

    But there was no architecture of understanding anything. Also missing were large stores of knowledge that linked concepts together. There was no memory of experience like a human might have. To make a computer have and use humanlike knowledge requires an architecture that connects code to the world that humans occupy.

    Segue

    And so, as a segue to two different future blog posts, the main problems were:

    1. A lack of symbol grounding.

    2. A lack of common sense.



    References

    [1] P.S. Dale, Language Development: Structure and Function, Dryden Press, 1972.
    [2] P.H. Winston, Artificial Intelligence, 2nd ed., Addison Wesley, 1984.
    [3] E. Charniak and D. McDermott, Introduction to Artificial Intelligence, Addison Wesley, 1985.

    Comments

    Thor Russell
    Interesting article. I think you are right about symbol grounding. Mathematics for example has to start with something that cannot be defined in terms of other existing concepts. You have to assume something to get started. So you need some real world experience to start. 
    However I am not sure that making symbols and trying to ground them is the right way, having the system automatically discover them seems better. Just recently there was 30,000 processors linked together that automatically learnt concepts from looking at images from the internet, "cat" for example it learnt from pictures without being told the concept existed. That is also the way intelligence has evolved, with symbols coming after such learning/classification of the organisms environment.

    Have you looked at Jeff Hawkins work? He has written a very good book called "on intelligence" and has a company Numenta with some impressive learning systems. Briefly intelligence of the level we are talking about requires automatic identification of patterns. Intelligence involves a compressed abstracted representation of the data the system is exposed to. That representation continuously predicts what will come in as input to the senses. If you think about it, all meaningful behavior MUST involve prediction. Its called the memory prediction framework. The higher level abstractions are automatically formed and correspond to the symbols we talk about. 
    Thor Russell
    SynapticNulship
    Learning is extremely important, but your examples don't actually solve or get around the symbols grounding problem. 
    SynapticNulship
    I'm very interested in learning from both phylogenetic and ontogenetic points of view and how those dimensions work together...there will be more of this in future blog posts.
    Thor Russell
    The idea is that the symbols are not programmed in at all, so there can be no grounding problem. If the concepts/symbols are discovered by the system, then they are automatically grounded.
    Thor Russell
    SynapticNulship
    It's a start but does not necessarily ground the program.
    Thor Russell
    I guess I don't know what you mean by "ground" then. How can a concept discovered automatically from data be anything other than grounded? I think you also need to specify more what you mean by "understand" or "meaning" if possible. To me a system that discovers a concept or pattern and its links to other patterns understands it as much as anything else. How can you objectively distinguish the understanding that you feel and the potential understanding a system (AI, or other creature me included) appears to demonstrate? 
    I think you can definitely do this if you have access to the internal structure of the system. If it starts off with no prior representation of the data, then after being exposed to it automatically builds up an abstracted, compressed representation that allows prediction, and further successful classification then to me the system has some genuine (grounded) understanding and has exhibited intelligence. After all where else could the classification algorithm have come from? It wasn't programmed in, the intelligence of the system created it.

    Going back the example of the computer automatically identifying the cat without being even told that cats exist, sure it doesn't have any understanding of cats being "felines" and furry predatory animals etc, but it still has some elementary understanding of the concept. 
    Thor Russell
    vongehr
    The idea is that the symbols are not programmed in at all, so there can be no grounding problem. If the concepts/symbols are discovered by the system, then they are automatically grounded.
    What is the simplest example where one can see this difference clearly? [Or are all those systems without obvious/explicit grounding as obscure like that cat example below. (Because, I suspect that they have just hidden the (artificial) grounding from view - after all, what motivates the system to look for cats at all if it does not "want to eat" cats? I fear that most reports of "self-lerning" systems are simply hiding the part were the programmers determine what the systems want to learn in the first place. Like those pain wall avoiding or light seeking ones - who told them that the wall is painful but light is good?)]
    Thor Russell
    Here is the paper:http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/unsupervised_icml2012.pdf

    I havn't studied it in detail, but if the paper does what it claims in the abstract: 
    "Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not." Then this is surely a simple example? Where did they tell it what to do. 


    I don't see how they have hidden the grounding from view. Sure they have to seek and pull out the part of the network that finds faces, (and then attach a now grounded symbol to it) but if no self-learning was going on it wouldn't be there just by chance. Perhaps this is still cheating (reverse grounding or something) to you?


    The point to me of this network is that there doesn't need motivation for unsupervised learning. The system just does it. Possibly the part that decides what to eat if it is there probably takes as inputs the invariant features (faces, cats etc) from the lower levels, and then decides what to do. Bottom up rather than top town. I don't see why you would want motivation to eat having anything to do with edge detection/face detection etc if it happens automatically and "bubbles up" anyway.
    Thor Russell
    vongehr
    they have to seek and pull out the part of the network that finds faces, (and then attach a now grounded symbol to it) but if no self-learning was going on it wouldn't be there just by chance. Perhaps this is still cheating (reverse grounding or something)
    That is precisely the part I am suspicious about (and you stated that "symbols are not programmed in at all, so there can be no grounding problem"). They start with "unlabeled" images, which is great, but in the end, they need to add the meaning. No big criticism; I can see how evolution supplies the other modules, say the one that evolved to connect the tasty parts to the hungy signals and the module that adds the utterance "cat" and all that. Add the Myth of Jones, and consciouness has arisen, talking about cute cat qualia. I just do not see where the profound difference is to a more explicit grounding of symbols.
    Thor Russell
    Fair enough. Some process where those important neuron/s were automatically identified (rather than searched for by hand) for higher levels to act on also seems to be needed. Neurons or groups of neurons would need a way to self-identify when they correspond to a face etc and are hence useful. Only then should they activate a higher level in the hierarchy.
    Consider a change of the architecture incorporating automatic prediction for input presented in a time based fashion. According to Hawkins I think, in the brain there are cortical columns (spatially separate) that attempt to train themselves to individual features and also continuously try to predict what sensory input is going to come next. Such prediction is attempted/assessed on multiple levels of the hierarchy. Basic "pixel" prediction is attempted as well as higher level edge placement, geometric shape etc. The idea is that you can beat a generic prediction algorithm when the input matches the pattern you are trained to recognize. Now possibly some kind of stochastic algorithm may be able to beat you on pixel prediction for the face part of the picture even if you know its a face and it doesn't, but for edge/shape prediction perhaps not so much. All modules would use the same output from the edge detectors etc to make their predictions.

    The columns "know" that their target is presented if they are able to predict what is coming next. After a period of above average prediction they increase their firing rate. Unlike in the paper, input needs to be presented in a time based fashion. (As you know, your vision is like this, you can't see a whole picture at once, but move your eyes continuously over parts of it.)

    For pictures you could implement something like this by giving the system parts of pictures. It would then try to predict the other part. For those with a large enough part of a face, the face detector would "know" it is useful because it would be able to predict with greater accuracy than the other columns (trained to cats, the color blue, round objects, generic stats or just nonsense)  the part of the other half of the picture with the other part of the face in it. The face detector would then be the most accurate compared to the other detectors. The module that compared prediction accuracy would then see that the face module was significantly more accurate say 5% of the time (whenever there was a face) and in it would then be attached to something higher up. Sure not fully associated to a concept in the way we know, but perhaps an improvement because no human had to manually search through all the "neurons" to find a useful one. 

    Continuous time based input prediction where it can be applied could also keep algorithms/programmers "honest" and help avoid overfitting etc. Its so common and frustrating to fit a model to data and have it fail completely for new data.
    Thor Russell
    vongehr
    Some process where those important neuron/s were automatically identified (rather than searched for by hand) for higher levels to act on also seems to be needed.
    That is what I meant by "the one that evolved to connect the tasty parts to the hungy signals", because "Neurons or groups of neurons would need a way to self-identify when they correspond to a face" is impossible. The whole point is that nothing in the brain has any idea about that a face is even anything special (in the paper it is something quite often present in the training set, too, not only afterwards selected by hand).
    the face detector would "know" it is useful because it would be able to predict with greater accuracy than the other columns
    How does this get around not knowing that a face is even interesting in the beginning? How does it know whether a prediction is more accurate (rather than less acurate for something else much more important to survival)?
    If I am missing something, perhaps you can explain it real clear with pictures and all in a blog post? Because this to me is a crucial point. A long evolution motivates the important aspects by natural selection, evolving many modules that do all kinds of different selecting, all very integrated so that it connects all the way to being hungry and all that. Without that grounding in millions of years evolution (with gazillions of systems having failed because they predicted the unimportant more succesfully than the important), how do you just not ground symbols at some point when making any artificial system?

    I think that is what Samuel meant by "
    It's a start but does not necessarily ground the program."
    Thor Russell
    Yes I think I need to take more time to explain it in detail, I intended to write a blog post but havn't got round to it yet. I will see if I can make a little more sense in the meantime.


    OK the process where "important"  neurons (actually modules now) are automatically selected.
    Firstly I need to clarify "important" to mean modules that find any patterns, not those patterns that are necessarily useful for survival. My assumption is that the system related to survival is higher on the hierarchy than that which selects patterns. Some patterns are important for survival, others aren't. The point of the lower levels is to greatly reduce the bandwidth of information coming into higher levels such as the survival one. 

    To give a crude example: rather than a raw sensory stream of say 1MB/second the survival module gets the timewise pattern:
    P1 P1 P1 P3 P3 P10
    in the same second. The survival module has to work out/has already worked out that P1 is food, and P10 is a predator.
    At the first time step P1 goes to the higher levels because at that time that module (M1) got the highest prediction accuracy. E.g. the organism sees half an apple, and the module that is trained to apple like things predicts sensory input best as the eyes fix on the other half.

    Secondly I have assumed that a module that gets the highest prediction accuracy (also above a certain threshold and for a reasonable amount of time) for a particular time/input pattern will always have latched onto something permanent/"important". I.e. something we would identify with like a solid object or similar concept more abstract than just random input. 

    The best prediction accuracy (and hence P1-P10) for a certain point in time could be calculated by a module that has the prediction accuracies from all other modules input into it. Modules could self report their own accuracy. It would then signal to that winning module that it was the most accurate, and signal to the higher levels in the hierarchy the label of that module (M1 corresponding to P1 for example).

    Now that permanent thing P1 best predicted by M1 may be an unimportant leaf or a predator, but a blank wall will trigger no modules to win, and part of an animal/predator/prey will always trigger some module to fire because it is a consistent and repeated pattern. (That pattern could then be learned from a parent to wire up to danger for example so is therefore grounded) So anything predictable is assumed to be potentially special and the lower level will find almost every pattern. (Our brains overdo this and trigger when there isn't in fact a pattern) The rest of the brain can then decide its in fact unimportant.  A module is only judged by the signal is gets most accurate, its a sparse representation with each module selecting for just one thing and staying silent/inaccurate for all others. If it fails completely to respond to a predator but recognizes dirt well, it will still stay around because something else will respond to that predator, and sometimes dirt is interesting. If it never receives feedback that it is most accurate, then it will rewire or something.

    So I havn't described how P1 translates to food or "apple" but do you see my motivation for including prediction accuracy and how it is defined/calculated. Have I still missed something crucial?
    Thor Russell
    Samuel, I absolutely loved this blog post. I found it interesting how you picked up the basis of your programming through some concepts in the book you read. I have always been curious about a system/computer that could learn from interactive experiences with humans. I think the whole idea is really cool. Anyway, thanks for sharing the post! It was a good read

    SynapticNulship
    Alexandra, I'm glad you enjoyed it. But this was merely an introduction--so stay tuned!
    Have you seen NARS, Non-axiomatic reasoning system?

    This is a grounded symbol system. Claimed to learn from experience and develop its own theories/models of the world.

    SynapticNulship
    Thanks for the comment.
    This is a grounded symbol system. Claimed to learn from experience and develop its own theories/models of the world. 
    The problem with this statement and with Thor Russell's first couple comments is that you're suggesting two assumptions which are not sound as far as I can tell:
    1. A grounded system (or that which "solves" the symbol grounding problem) must have learned from experience.

    2. Any arbitrary method for an information system to learn from experience grounds that system.


    I have no reason to believe #1 or #2.