Fake Banner
    Why Crowdsourcing Cannot Be Fixed
    By Michael Martinez | October 11th 2012 11:24 AM | 5 comments | Print | E-mail | Track Comments
    About Michael

    Michael Martinez has a Bachelor of Science degree in Computer Science, an Associate of Science degree in Data Processing Technology, and a few certifications...

    View Michael's Profile
    Researchers from the University of Southampton have published a study in which they argue that crowdsourcing information can be improved through use of incentives.  In "Making crowdsourcing more reliable" Dr Victor Naroditskiy and Professor Nick Jennings and others propose to incentivize crowdsourcing tasks to improve the verification of credible contributors.
    In other words, if I join a crowdsourcing community and then I recruit you to make contributions, if I earn a reward for bringing you in I'll be incentivized to find the best match between your skills and the community's needs.  The proposal assumes that the usual crowdsourcing rules for reviewing contributions will remain in place.  In other words, an obvious shill should still be found out and therefore I won't earn my rewards if I bring in shills.

    I am already a heavy critic of leaving vital information collection and editing to the mob-mentality of the Wikipedias of the World Wide Web.  These ad hoc groups are woefully inadequate and despite all the citations of "studies" that purport to show just how reliable Wikipedia information may be (compared to the Encyclopedia Britannica) the reality of crowdsourced information management is that the good information does not float to the top naturally.  It cannot float to the top and in many situations (particularly with Wikipedia) it is blatantly prevented from floating to the top by the systemic processes that are intended to ensure that it floats to the top.

    Yahoo! and Stanford University researchers illustrated why this happens in a 2004 paper titled "Combatting Web Spam with TrustRank" in which they proposed using a "good seed set" of Website from which to begin crawling the Web.  The good seed set was selected by an "expert" (they do not specify the criteria for determining who the expert is).  The paper proposes that "Good" Websites are more likely to link to other "Good" websites and less likely to link to "Bad" Websites; hence, the inevitable inclusion of Web spam in a trusted crawl should happen much later than in a random crawl. It is now widely assumed that search engines follow this principle in some way today.

    It didn't take long for analysts to realize that if a search engine begins with a polluted or toxic seed set -- one that contains secret spam sites -- that the entire "trusted crawl" process is derailed from the start.  Unscrupulous Web marketers immediately set themselves to the task of getting their listings included in every conceivable "trusted seed set" they could identify, starting with the Yahoo! and Open Directories.  I can say without any doubt that their efforts were successful as search crawling technologies have had to add many layers of filtration and evaluation to the process since 2004.

    In fact, Google has re-engineered its technologies from top to bottom several times since then, as has Microsoft, and Yahoo! has stopped developing general Web search technology.  Today Google appears to be relying more on machine-learning algorithms than ever before (perhaps influenced by Yandex's own reliance on learning algorithms, perhaps only for independently chosen reasons).  The machine-learning approach is complicated and heavily invested in the same problem facing crowdsourcing communities: someone has to be the "Good seed", the initial expert who recruits all the other experts into the process.

    The quality of the initial expert determines the success of the chain of recruitment regardless of whether you are dealing with contributors to an information project like Wikipedia or building a pool of good (or bad) Websites for a learning algorithm to study.  In March 2011 Google engineers Amit Singhal (head of search) and Matt Cutts (head of Web spam) told Wired magazine that they launched their revolutionary Panda algorithm by having "outside testers" classify an undisclosed (but presumed large) number of Websites as "high quality" or "low quality".

    Conceding the difficulty of the process, the Google engineers did not claim that they asked the testers to figure out which sites were "right" or "wrong", but rather asked them which sites they would be more likely to trust (in fact, the testers were asked a large number of questions).  The "gut instinct" approach bypasses the "who is the right expert?" question and presupposes that if enough people think a certain Website is more trustworthy in its presentation of information that it must be pretty good.  This is one of the cornerstone's of Google's dynamic technologies, the so-called "PageRank strategy" which is based on citation analysis.

    In 2008 the Math Union published a study calling into question the widespread use of citation analysis in scientific literature, noting "Relying on statistics is not more accurate when the statistics are improperly used"; "While numbers appear to be "objective", their objectivity can be illusory"; "The sole reliance on citation data provides at best an incomplete and often shallow understanding of research—an understanding that is valid only when reinforced by other judgments. Numbers are not inherently superior to sound judgments." (Note: That is original emphasis.)

    Of course, my citing the Math Union study may be flawed for the very reasons that the study cites -- am I expert enough in these matters to provide sound judgment?  And that is the crux of the issue for Wikipedia and the Google Panda algorithm.  Google recognized that Panda was a first step toward a new level of search filtration technology.  By creating a large pool of human-reviewed Websites that were divided into "high quality" and "low quality" they were able to give their algorithm a crash course in human judgment.  Most observers seemed to feel that the Panda algorithm did a pretty good job of downgrading poorly designed Websites but it produced false-positive results, too.

    In fact, Google has been releasing new iterations of Panda data as it recrawls the Web to ensure that changes made to downgraded sites (as well as newly published sites) are evaluated.  I believe that Google also expanded the learning sets several times throughout 2011, and perhaps may have a process in place to continue expanding the learning sets as new types of Web content and structures are identified.

    The standard for Web quality is arbitrary and subjective, and with respect to its own analysis Google controls the standard.  Hence, being in a position to reject the findings of their testers the Google engineers vetted the vetters according to their own precise criteria.  That is, we cannot say there is an objective measurement that would demonstrate that Google's learning set divisions are inexpert or unsound.  Other groups might produce different criteria for selecting "high quality" and "low quality" Websites but in the Panda scenario the seed set of experts is self-defined and therefore reliable.

    It appears that Google has integrated the Panda algorithm into other filtration processes (including their Penguin filter and perhaps more recently their "Exact-match Domain" filter).  Unlike Google, however, groups such as Wikipedia (and several thousand imitators) do not begin their inclusion and recruitment processes with a set of precise criteria for quality of information; rather, Wiki sets rely upon precise criteria for presentation and citation.  Any source can be cited (including random Websites with no academic or scientific credentials -- and millions of such citations are found throughout the Wikisphere of the Web).

    Lacking precise criteria for quality a group or organization relying on crowdsourcing -- even incentivized crowdsourcing -- remains essentially inexpert in the process of distinguishing between high and low quality information.  In practice Wikipedia has been singled out for ridicule and criticism because the Wikipedia editors -- adept at following and enforcing the rules -- have driven out true experts in numerous topics who failed to comply with the rules.  In other words, the rules for a crowdsourcing project determine what chances of success the project has, and where emphasis is placed on quality of compliance with rules rather than quality of information the project fails to take advantage of the "trusted seed set" principle.

    It is this reliance upon "trusted seed sets" and initial experts that make crowdsourcing efforts chaotic and unreliable.  In the past I have drawn upon Swarm Theory and Percolation Theory to explain a model for Viral Propaganda Theory, which shows that as competing messages spread through a previously unconverted population of nodes the most well-connected nodes will speed propagation of the message throughout the population.  In other words, the rabbit wins the propaganda race and the turtle loses because the rate of the spread of information ("the message") determines the probability of how much of the population will be converted to a particular message.

    I'm summarizing a lot of theoretical work in that one paragraph but what I am saying is that a trusted seed set produces success or failure in the crowdsourced project for the same reason that a poorly connected node fails to convey its message to a majority of other nodes in a population: the chain of communication -- whether crawl for a search engine or recruitment for a crowd gathering information -- most likely to win is the one with the most connections, not the one with the most credibility.

    There is no mechanism for ensuring that true experts can influence large populations of inexperts sufficiently to prevent misinformation from being widely propagated.  That is why the Encyclopedia Britannica was the better resource than Wikipedia.  While a small group of professional editors (trained in vetting and selecting reliable sources of information) takes longer to include information (and includes less information), the information is more likely to be correct -- a determination that cannot be undermined by "independent" evaluation from inexpert groups.

    In other words, because no one can be expert enough in all topics to ensure that only correct information is included in a crowdsourced resource the more inexpert contributors a project relies upon the more misinformation will be accepted as reliable.  It's simply a matter of numbers.  The expert nodes in the population will always be outnumbered by the inexpert nodes; the expert nodes will always have fewer connections than the inexpert nodes; the expert message will always be propagated more slowly than the inexpert message; and the inexpert message will always win out over the expert message.

    The system is designed to drown out the minority and the most qualified experts in any topic will always be a minority in any random collection of contributors.  If the project managers fail to take this into consideration from the start, a crowdsourced application or process can only produce unreliable information.  That doesn't mean all the information is inaccurate; it means that the body of information, taken together, is incapable of being refined to a level of accuracy that matches the quality of information provided by qualified minorities.

    To put it another way, to fix the flaw in Wikis we would have to ensure that every Wiki on the Web has a qualified board of experts who are capable of selecting appropriate experts to oversee each topic.  This isn't just a problem for Wikipedia -- it's a problem for every random Web forum, question-and-answer site, and all social media that rely upon the crowd to surface good, reliable information.  That will never happen.  It's mathematically impossible unless you stack the odds in favor of the right information by placing qualified editorial boards over the information vetting process.

    Google sidestepped the issue of "being expert in all things" by simply defining a standard of general Website quality; this is a better approach than defining a set of rules for inclusion and presentation (the basic Wiki process).  But as history has shown us, even Google's approach came up short in the challenge.  Low quality Websites continued to persist in some of the billions of queries Google manages and they have had to modify their criteria in subsequent filters.  Still, they have probably demonstrated that machine learning is a better solution than mere crowdsourcing.

    One way to improve the quality of the Wikisphere might be to use a Panda-like algorithm to assess the quality of articles on the thousands of Wiki sites that have been published.  But that would require the selection of a team of experts for every major topic and many sub-topics who would in turn have to vet a substantial number of articles that could be divided into "high quality" and "low quality".

    And even then the Wiki format would require that the articles be re-evaluated continually, because the other fundamental flaw in the Wiki process is that the articles are constantly changing.  A Wiki article that is good today may, in two years, be even better or much worse simply because it has accrued so many changes that it no longer resembles today's article.  The crowd cannot be made expert (and apolitical) in all things.  Hence, crowdsourcing cannot be fixed.

    Comments

    Wow. You crammed a lot into this one article, Michael. My biases agree with most of the things you say.

    Your citation of the problems in the search sphere, which I have some small experience with, as they relate to the more general problem of crowd sourcing was a rather smart approach.

    I would change your second from last sentence to read: The crowd cannot be made expert (and apolitical) in any thing.

    Gerhard Adam
    Excellent article.
    Mundus vult decipi
    vongehr
    Great article, but there is naivety here, a romantic concept of the lonely "expert" disregarded by the mob's linking, mingled with that of endorsed "experts" (they have an evolved social function, and it is most certainly not to reveal truth) and absolute truth; all this needs to consider construction of truths as opinions with power behind. What you discuss here belongs to the evolution of the rationalizing irrationality of social systems' perception/cognition. They are as bad or worse than human belief, if they do not allow different selective processes (endorsed expert heavy, human crowd, AI interpolation, ???) to survive in parallel, much like a relatively rational person is not one in denial (claiming to be unbiased) but instead aware of her own emotions/intuitions/biases. There is nothing more wrong and dangerous than a cognitive system convinced to be right and having already sufficiently taken into account all reasonable doubt.
    Hank
    I think there are two different crowsourcing ideas here. I agree Wikipedia is useless and I basically stop reading any comment that uses Wikipedia is a source.  I am sure it gets some things right - even a blind squirrel finds a nut once in a while - but if it can't even get the Science 2.0 entry right, that is a pretty large datapoint.

    But if you look at the crowdsourcing done on Galaxy Zoo or Foldit, they are different animals because they are not compiling answers from people who know nothing and tabulating the average - or, in the case of Wikipedia, letting the most rabid goofball who survives the editor war win - someone in the crowd is more likely to do something meaningful.  It is not going to produce an Einstein but it can do protein folding better than a computer because it promotes creativity.
    Michael Martinez
    If the "crowd" is a crowd of experts (at least credibly knowledgeable people who are not divided by personal biases) then the "crowd" is not the general crowd of the Web, so yes I agree there are two crowds -- and it is the exclusive crowd that has the best chance to succeed (as in Google's use of crowdsourcing, where they set the rules and the crowd was not self-governing).
    The idea that the crowdsourced Web can be moderated through incentivization is what led me to write this.  I've looked at quality from many sides now and the self-governing anonymous crowd is incapable of engineering a collective solution that prevents manipulation from setting in.

    In other words, if the crowd is small enough that everyone can know everyone else and check each other's credentials then "majority rule" has a chance to succeed (although as soon as division of opinion sets in that chance declines dramatically -- witness what happened to Wikileaks); however, when the crowd is large enough that everyone can NOT know everyone and check credentials, they have to fall back on their rule set to ensure that manipulation doesn't set in.

    Manipulation can be naive (firm belief in the wrong information) or hostile (intentional disruption of the information pool) -- it doesn't matter what the motive is.  As soon as wrong information is introduced into the system where legitimate expertise is unable to vet the information then the viral process sets in and misinformation begins to spread from node to node.

    What I should have included, now you bring this up, is that there is a sliding scale to the probability of success or failure in maintaining the integrity of the information pool that correlates to the numbers of the experts versus the inexperts in any pool of contributors.  With any successful and popular Web community that ratio is quickly reduced to insignificance for the experts.

    It's a numbers game and a simple technique such as incentivization cannot prevent the inevitable manipulation from setting in and taking hold.  There has to be a cap on the number of participants who hold the final decision-making process or else the community falls back on favoring process instead of accuracy.