There's an interesting discussion over on the Wikipedia boards about the topic of "popular misconceptions" that has all sorts of interesting linguistic and informational aspects to it ( .  The conundrum is that the article is intended to address "popular misconceptions" and at one point someone (identified only as "a user") steps in with a set of very cogent and thoughtful points that get looked at but never addressed.  They're particularly interesting because in sum they describe several of the great issues with information science as it is practiced over the Internet.  In part, our problem with the Internet is that it is an information source unlike any other we've ever had access to.  We know how to deal with information from books and other sources... but no one has a good grasp on how to deal with the ever-increasing volume of information on the Internet.

Since I'm inviting friends from Livejournal and Facebook to read these blogs, I should stop and say that "information" to an information scientist like myself isn't actually things like "My cat is washing his right paw."  Information to us is any bit of data, including noise.  It is the presence of a somethingness in the middle of nothingness.  Information can be true, partly true, irrelevant, meaningless, full of noise, incomplete, complete -- or anything else.  When I'm dealing with information I generally like to use Fuzzy Logic Algebra.  It's lots of fun, but I'm going to spare everyone in these comments.

So... back to the original question that interested me:  "You have all the data on the Internet.  According to our knowledge of the world, there are lots of misconceptions out there ("the earth is actually hollow" would be a good example), with more being created daily (thank you, YouTube.)  If you want to describe a set of "popular misconceptions" you have to (as the comment in Wikipedia said) answer a number of questions, including:

* what does "popular" mean?  If 6 sites out of 24 on sauropod vertebra describe the bones of Alamosaurus Sanjuanensis as being solid with no internal chambers, then 1/4th of the sites are wrong.   That's a lot of sites relating to this dinosaur that are wrong.  So, we can say it's a misconception (the scientists didn't read papers about the unusual bones) or it's an error (they misremembered the facts about the bones.)  Is this a "popular" misconception since this imaginary error is on a full 1/4th of the sites about this dinosaur?   I'd bet money that you couldn't find one person in 100 who could tell you what Alamosaurus Sanjuanensis is unless they read my Livejournal or belong to the Society of Vertebrate Paleontologists.  How many websites does it take for a "popular misconception"?

 * where is the source?  In almost all nonfiction writing, you can trace the lineage of each "factoid" to some source.  It has a pedigree and a timestamp.   In the case of the Alamosaurus example, we could find a person and a time when the misstatement was made and could probably track its history.

But when it comes to "folklore" information, the source may be almost impossible to locate or it may appear to have multiple similar sources.  One good example is "Egyptologists don't really know what the pyramids were built for.  No mummies were ever found in any of them."   We can identify the conditions that compose this assumption -- that Egypt only has 3 pyramids, that nobody can study them because of the Evil Dr. Zahi Hawass, that Egyptologists don't know how to properly read hieroglyphics, that the only thing on Giza plateau is the pyramids, that a New Age is coming, that Aliens left the advanced technology for us encoded in the pyramids, that the pyramids are so sophisticated we can't duplicate them with today's technology ... and many more.

It's a mess.  Which do you start with first, and how do you address the misconception in a manner that hopefully it dies and does not come stumbling back like some evil information zombie?  You could look at it as a problem in logistics -- which is THE most critical piece of information that causes all the other pieces to fall apart more quickly... but given the different riffs of pyramid lore, it's hard to pick just one.

A second problem is that things which are now identified as misconceptions may later turn out to be true.  One current example that could go either way is the "Clovis Comet theory." -- or the wildly popular "String theory."  There is both folk knowledge and scientific papers based on these two concepts... if they are proven wrong, how do you deal with the chain of literature that is now wrong, identifying it so that armchair readers who want to discuss science understand that these theories were rejected, that the rejection wasn't part of a great conspsiracy, and that the concepts are interesting but shouldn't be used as foundation for further thinking.

Looking at it from a programmer's view, the data we label as "right" and "wrong" is in a constant state of flux in some areas.  Some of it is indeterminate (we don't know if it's right or wrong).  If we are doing a search through all possible sources on the internet, how do we retrieve known good data when the questions we are asking are not based on scientific papers.  For instance, if I wanted to find out why my Siamese cat paws at a surface (almost like he's scratching in the litterbox) before he lies down on it, there is no scientific paper that explains the quirks of one very quirky cat.  I might be able to find "folk knowledge" (the "They do it because they feel insecure and are looking for their original nest box with their mother") which may be correct or may be off base.

When people google, they want good information.  How do we dump the "probable misconceptions" to the dark corners so the better information rises to the top?

Food for thought on a rainy Friday.