Essentially you take any search result for a query (up to 1,000 listings) and categorize each listing as "Natural (not intentionally targeted at that query)", "Transparent (clearly intentionally targeted at that query)", and "Opaque (targeted at the query but in such a way as not to be obvious to a random, casual searcher)".
Through the years as I have measured Websites, Website traffic, links to Websites, links from Websites, and other things I have seen that this probability distribution is a useful tool for distinguishing between "heavily exploited topics" and "not-so-exploited topics" in the sense of where intentional manipulation of the user experience and/or search results occurs. There is a LOT of manipulation going on through malware, commercial marketing, political propaganda, illegal use of trademarks, and more.
But while exploring Naturality on the Web I have run into the consistency of Power Law distributions in virtually all the data I evaluate. I have long since ceased being totally amazed by this phenomenon. Now it only just amazes me that nature (as viewed in the collective behaviors of humans populating the Internet with a fractal-like explosion of expressions of ideas) manages to distribute just about everything in some sort of power law scatter.
When I look at the backlinks for a Website there is almost always a small fraction of sites that provide the most inbound links, and never the same number from any two sites but always a descending tree of orders of magnitude.
When I look at the number of visitors to a busy Website for any timeframe (1 day, 1 month, 1 year) I see a similar pattern. Only a very small percentage of pages receive large numbers of visitors compared to the rest of the pages on the site.
When I look at search referral traffic, non-search referral traffic, page views, advertising revenue, sales revenue -- regardless of whether I slice the data by page, by keyword, by month -- I almost always see a lead data point that is larger than all the others, followed by a second largest point, etc.
At first I wondered if there was a single probability or power law distribution that mapped to all these patterns but I never found anything consistent. I can always find some named power law that closely matches some set of data but there is a natural variation in the distributions. One thing I have never had the opportunity to do is document the power law distributions I have found to see if there is -- in a sort of fractal way -- a power law distribution of power law distributions in Internet data.
I suspect that would be a Prime Power Law property of data super structures but I'm ill-equipped by knowledge, experience, and resources to look for that kind of transformation in data properties. There are relatively few if any people actually investigating the properties of data super structures (at least by that name). We easily ask how concrete changes when we use it to build a highway but we don't really ask how concrete contributes to the properties of cities and civilizations in the same way. Concrete is separated from a city super structure by at least one order of magnitude (road systems or buildings or sidewalks -- take your pick) in super structure complexity.
What leads me to meander through these terms and concepts today is an article that Vint Cerf published on the Google blog. Citing a Verisign study of domain name distributions Vint says that "nearly 50 percent of the Websites we visit are found in the .com top-level domain". The study doesn't present the numbers in sufficient detail to show that there is a power law at work in the data but I think there is.
Vint closes his brief article with: "By opening up more choices for Internet domain names, we hope people will find options for more diverse—and perhaps shorter—signposts in cyberspace."
I know there is a tremendous bias toward use of the .com TLD among Internet marketers because many people believe (regardless of what several marketing studies have shown) that search engines "favor" .com TLDs above other TLDs. It can certainly be shown that the Web has a bias toward the .com TLD in terms of its linking and citation practices as well as in its domain registrations. Is this, however, a self-fulfilling prophecy or merely a layman's interpretation of the world around him (a myth)?
At one time I was convinced that Internet marketing (as an industry) had a profound impact on these kinds of perceptions but relatively few if any Internet marketers could be said to wield sufficient influence to have created such a tremendous bias in so few years (just under 20) among hundreds of millions of people. Now I am more certain that there are natural forces at work which (I, at least) don't understand well. I suspect there isn't enough social science involved to really explain why we favored .com over, say, .org or .net.
In fact, when I registered my first domain name in 1997 I chose a .org domain rather than a .com because the convention was that personal Websites should fall into the ".org" TLD. I was one of relatively few people aware of that convention and apparently I fell into an even smaller group who actually abided by it. The rest of the world went on to register millions of personal domains in the .com TLD. I have since done the same myself.
So is there much reasonable hope of enticing people to utilize these new domain name spaces at a rate equivalent to what previously existing domain name spaces have been developed? I submit that a power law will govern this distribution of experimental applications. Some new TLDs (relatively few) will eventually emerge as TLDs of choice, whereas most will languish with relatively small user bases. It's only natural that this should be so.
We can theorize on why people tend to congregate near water even when they cannot drink it or use it for irrigation; we can explain how human society expands its capabilities as our numbers grow; but we have yet to come up with a Universal Theory for Power Law Distributions in Human Activity. The Internet is a nearly perfect microcosm of such activity that avails itself to this kind of study and analysis: we collect and share data about virtually everything we do on the Internet.
I am, perhaps, blinded by my theory of Naturality to the point where I can only see the world in terms of what happens because of our intentions combined with what happens in spite of them. For example, a recent study on climate change has revealed how the ancient Indus civilization rose and fell with the fortunes of a river system that changed as monsoon patterns changed thousands of years ago. Today we argue about how much impact mankind is having on global warming but we are still coming to grips with how much impact global warming has had on mankind. Our civilization would not exist without global warming, and yet we fear that we may have doomed ourselves to extinction because of our civilization's contributions to global warming.
Some call this pollution of the environment. Some call it other things. To me it seems perfectly natural. The Earth's ecosystem has found ways to enhance its environment even as its environment has modified the ecosystem. If we humans are special then I think that specialness arises from our sense of awareness of what is happening and our collective ability to make a change -- but that assumes we have the intellectual superstructure to choose to make such a change, and to change that choice should we realize it's the wrong choice.
It would require that kind of intellectual superstructure to change how we use the Internet. We would have to fight the power laws. I'm not sure we can do this. I don't know of any benefit to trying to fight the power laws. As the current war between intellectual property rights owners and free use advocates shows, even bringing the power of governments to bear on a crisis seems to provide relatively little return on investment. IPR owners have benefitted more, in my opinion, from advancing new technologies to make it easier, more equitable, and more economical to share their properties while obtaining some compensation for that use.
We stand at the edge of a vast amount of data, data which might provide us with tremendous insight into not only how natural systems behave but also into how to rise above the naturality -- both transparently and opaquely -- such that we may come to exercise more control over our future than ever before. For the time being the Power Laws rule the world. We may choose to be content with that rule but I think we're almost ready to make an informed choice.