Why Crowdsourcing Cannot Be Fixed

Researchers from the University of Southampton have published a study in which they argue that crowdsourcing information can be improved through use of incentives. In "Making crowdsourcing more reliable" Dr Victor Naroditskiy and Professor Nick Jennings and others propose to incentivize crowdsourcing tasks to improve the verification of credible contributors.
In other words, if I join a crowdsourcing community and then I recruit you to make contributions, if I earn a reward for bringing you in I'll be incentivized to find the best match between your skills and the community's needs. The proposal assumes that the usual crowdsourcing rules for reviewing contributions will remain in place. In other words, an obvious shill should still be found out and therefore I won't earn my rewards if I bring in shills.

I am already a heavy critic of leaving vital information collection and editing to the mob-mentality of the Wikipedias of the World Wide Web. These ad hoc groups are woefully inadequate and despite all the citations of "studies" that purport to show just how reliable Wikipedia information may be (compared to the Encyclopedia Britannica) the reality of crowdsourced information management is that the good information does not float to the top naturally. It cannot float to the top and in many situations (particularly with Wikipedia) it is blatantly prevented from floating to the top by the systemic processes that are intended to ensure that it floats to the top.

Yahoo! and Stanford University researchers illustrated why this happens in a 2004 paper titled "Combatting Web Spam with TrustRank" in which they proposed using a "good seed set" of Website from which to begin crawling the Web. The good seed set was selected by an "expert" (they do not specify the criteria for determining who the expert is). The paper proposes that "Good" Websites are more likely to link to other "Good" websites and less likely to link to "Bad" Websites; hence, the inevitable inclusion of Web spam in a trusted crawl should happen much later than in a random crawl. It is now widely assumed that search engines follow this principle in some way today.

It didn't take long for analysts to realize that if a search engine begins with a polluted or toxic seed set -- one that contains secret spam sites -- that the entire "trusted crawl" process is derailed from the start. Unscrupulous Web marketers immediately set themselves to the task of getting their listings included in every conceivable "trusted seed set" they could identify, starting with the Yahoo! and Open Directories. I can say without any doubt that their efforts were successful as search crawling technologies have had to add many layers of filtration and evaluation to the process since 2004.

In fact, Google has re-engineered its technologies from top to bottom several times since then, as has Microsoft, and Yahoo! has stopped developing general Web search technology. Today Google appears to be relying more on machine-learning algorithms than ever before (perhaps influenced by Yandex's own reliance on learning algorithms, perhaps only for independently chosen reasons). The machine-learning approach is complicated and heavily invested in the same problem facing crowdsourcing communities: someone has to be the "Good seed", the initial expert who recruits all the other experts into the process.

The quality of the initial expert determines the success of the chain of recruitment regardless of whether you are dealing with contributors to an information project like Wikipedia or building a pool of good (or bad) Websites for a learning algorithm to study. In March 2011 Google engineers Amit Singhal (head of search) and Matt Cutts (head of Web spam) told Wired magazine that they launched their revolutionary Panda algorithm by having "outside testers" classify an undisclosed (but presumed large) number of Websites as "high quality" or "low quality".

Conceding the difficulty of the process, the Google engineers did not claim that they asked the testers to figure out which sites were "right" or "wrong", but rather asked them which sites they would be more likely to trust (in fact, the testers were asked a large number of questions). The "gut instinct" approach bypasses the "who is the right expert?" question and presupposes that if enough people think a certain Website is more trustworthy in its presentation of information that it must be pretty good. This is one of the cornerstone's of Google's dynamic technologies, the so-called "PageRank strategy" which is based on citation analysis.

In 2008 the Math Union published a study calling into question the widespread use of citation analysis in scientific literature, noting "Relying on statistics is not more accurate when the statistics are improperly used"; "While numbers appear to be "objective", their objectivity can be illusory"; "The sole reliance on citation data provides at best an incomplete and often shallow understanding of research—an understanding that is valid only when reinforced by other judgments. Numbers are not inherently superior to sound judgments." (Note: That is original emphasis.)

Of course, my citing the Math Union study may be flawed for the very reasons that the study cites -- am I expert enough in these matters to provide sound judgment? And that is the crux of the issue for Wikipedia and the Google Panda algorithm. Google recognized that Panda was a first step toward a new level of search filtration technology. By creating a large pool of human-reviewed Websites that were divided into "high quality" and "low quality" they were able to give their algorithm a crash course in human judgment. Most observers seemed to feel that the Panda algorithm did a pretty good job of downgrading poorly designed Websites but it produced false-positive results, too.

In fact, Google has been releasing new iterations of Panda data as it recrawls the Web to ensure that changes made to downgraded sites (as well as newly published sites) are evaluated. I believe that Google also expanded the learning sets several times throughout 2011, and perhaps may have a process in place to continue expanding the learning sets as new types of Web content and structures are identified.

The standard for Web quality is arbitrary and subjective, and with respect to its own analysis Google controls the standard. Hence, being in a position to reject the findings of their testers the Google engineers vetted the vetters according to their own precise criteria. That is, we cannot say there is an objective measurement that would demonstrate that Google's learning set divisions are inexpert or unsound. Other groups might produce different criteria for selecting "high quality" and "low quality" Websites but in the Panda scenario the seed set of experts is self-defined and therefore reliable.

It appears that Google has integrated the Panda algorithm into other filtration processes (including their Penguin filter and perhaps more recently their "Exact-match Domain" filter). Unlike Google, however, groups such as Wikipedia (and several thousand imitators) do not begin their inclusion and recruitment processes with a set of precise criteria for quality of information; rather, Wiki sets rely upon precise criteria for presentation and citation. Any source can be cited (including random Websites with no academic or scientific credentials -- and millions of such citations are found throughout the Wikisphere of the Web).

Lacking precise criteria for quality a group or organization relying on crowdsourcing -- even incentivized crowdsourcing -- remains essentially inexpert in the process of distinguishing between high and low quality information. In practice Wikipedia has been singled out for ridicule and criticism because the Wikipedia editors -- adept at following and enforcing the rules -- have driven out true experts in numerous topics who failed to comply with the rules. In other words, the rules for a crowdsourcing project determine what chances of success the project has, and where emphasis is placed on quality of compliance with rules rather than quality of information the project fails to take advantage of the "trusted seed set" principle.

It is this reliance upon "trusted seed sets" and initial experts that make crowdsourcing efforts chaotic and unreliable. In the past I have drawn upon Swarm Theory and Percolation Theory to explain a model for Viral Propaganda Theory, which shows that as competing messages spread through a previously unconverted population of nodes the most well-connected nodes will speed propagation of the message throughout the population. In other words, the rabbit wins the propaganda race and the turtle loses because the rate of the spread of information ("the message") determines the probability of how much of the population will be converted to a particular message.

I'm summarizing a lot of theoretical work in that one paragraph but what I am saying is that a trusted seed set produces success or failure in the crowdsourced project for the same reason that a poorly connected node fails to convey its message to a majority of other nodes in a population: the chain of communication -- whether crawl for a search engine or recruitment for a crowd gathering information -- most likely to win is the one with the most connections, not the one with the most credibility.

There is no mechanism for ensuring that true experts can influence large populations of inexperts sufficiently to prevent misinformation from being widely propagated. That is why the Encyclopedia Britannica was the better resource than Wikipedia. While a small group of professional editors (trained in vetting and selecting reliable sources of information) takes longer to include information (and includes less information), the information is more likely to be correct -- a determination that cannot be undermined by "independent" evaluation from inexpert groups.

In other words, because no one can be expert enough in all topics to ensure that only correct information is included in a crowdsourced resource the more inexpert contributors a project relies upon the more misinformation will be accepted as reliable. It's simply a matter of numbers. The expert nodes in the population will always be outnumbered by the inexpert nodes; the expert nodes will always have fewer connections than the inexpert nodes; the expert message will always be propagated more slowly than the inexpert message; and the inexpert message will always win out over the expert message.

The system is designed to drown out the minority and the most qualified experts in any topic will always be a minority in any random collection of contributors. If the project managers fail to take this into consideration from the start, a crowdsourced application or process can only produce unreliable information. That doesn't mean all the information is inaccurate; it means that the body of information, taken together, is incapable of being refined to a level of accuracy that matches the quality of information provided by qualified minorities.

To put it another way, to fix the flaw in Wikis we would have to ensure that every Wiki on the Web has a qualified board of experts who are capable of selecting appropriate experts to oversee each topic. This isn't just a problem for Wikipedia -- it's a problem for every random Web forum, question-and-answer site, and all social media that rely upon the crowd to surface good, reliable information. That will never happen. It's mathematically impossible unless you stack the odds in favor of the right information by placing qualified editorial boards over the information vetting process.

Google sidestepped the issue of "being expert in all things" by simply defining a standard of general Website quality; this is a better approach than defining a set of rules for inclusion and presentation (the basic Wiki process). But as history has shown us, even Google's approach came up short in the challenge. Low quality Websites continued to persist in some of the billions of queries Google manages and they have had to modify their criteria in subsequent filters. Still, they have probably demonstrated that machine learning is a better solution than mere crowdsourcing.

One way to improve the quality of the Wikisphere might be to use a Panda-like algorithm to assess the quality of articles on the thousands of Wiki sites that have been published. But that would require the selection of a team of experts for every major topic and many sub-topics who would in turn have to vet a substantial number of articles that could be divided into "high quality" and "low quality".

And even then the Wiki format would require that the articles be re-evaluated continually, because the other fundamental flaw in the Wiki process is that the articles are constantly changing. A Wiki article that is good today may, in two years, be even better or much worse simply because it has accrued so many changes that it no longer resembles today's article. The crowd cannot be made expert (and apolitical) in all things. Hence, crowdsourcing cannot be fixed.

Related articles

Comments

Know Science And Want To Write?

Donate or Buy SWAG