Imagine you are living within a bubble. Your bubble stretches in any direction but you are always contained within it. Your bubble is adrift in a sea that has no bottom, only a surface. The surface surrounds your bubble, so the sea itself is only a bubble. Beyond the surface lies darkness.
This is the virtual universe we live in every day. We call it the Internet. The bubble around you is the collection of Websites you read and comment on. The sea through which your bubble drifts is the Searchable Web. The darkness beyond the Searchable Web is everything else -- everything that you cannot get to, or which has no meaning to you.
Millions of people view the Searchable Web through the lens of Google. Millions of other people use the lens of Yandex (the leading Russian-language search engine). Millions more use the lens of Baidu (the leading Chinese-language search engine).
Through any one of these lenses you may see a virtual universe of Websites but you won't see the same universe from all three. Yandex and Baidu both index non-core language Websites. Yandex made a very public entrance into the English Web, for example; Baidu has made a less public entrance into that Web.
How large is the Internet? How large is the World Wide Web? How large is the Searchable Web? Several studies have attempted to answer these questions but their methods and estimates are obsolete. The Web is such a dynamic environment that it cannot be measured.
Imagine that the speed of light is not a limit to our ability to observe the universe. Imagine that we can see as much of the universe as we think we have mapped in real-time. Now imagine that every day whole galaxies vanish and whole galaxies appear in different parts of the universe. Some of these galaxies allow us to look into them; other galaxies obscure their stars from outside observation.
In this chaotic real-time universe where things pop into and out of existence other things are morphing. What was a spiral galaxy yesterday may morph into a protogalaxy -- a mass of unshaped gas -- today. A protogalaxy may suddenly morph into a minor young galaxy. And the filaments of gravity and dark matter that connect the galaxies and protogalaxies are constantly shifting, changing.
That is the World Wide Web. It operates according to its own set of laws. There are real principles of cause and effect, true limits to what can be accomplished. The Web is a virtual universe that exists and functions and lives like a universe within the boundaries of the machines that we connect together through the Internet protocol.
I have participated in attempts to estimate the size of the Web, or portions of the Web. It's impossible to know at any one time how many servers and clients are connected to the Internet but there is a finite limit. We know the limit is real but we cannot find it. It is impossible to know at any one time how many Websites are hosted on the connected servers even though there is a real limit to that number, and we cannot find that limit, either.
Using a search engine to estimate the size of the Web is equivalent to using a pie plate to hold all the food in a restaurant. You can do it. You just cannot place all the food there at once. You can see all of the Web through a search engine; you just cannot see it all at once.
Not only do Websites morph, vanish, and burst into existence continually, search engines are continually filtering their data, changing their definitions of what constitutes discrete Web content, and limiting the information they share with searchers.
To date, search engines have publicly claimed to have crawled about 1,000,000,000,000 URLs. Many of those URLs were really ghost images -- duplicate URLs served with session IDs, or presented through alternate taxonomies, or otherwise generated dynamically. The 1 trillion number is probably only a minority fraction of the whole Web and perhaps as much as 10-15% of those claimed URLs have already vanished.
We cannot find all these URLs but we can almost instantly connect to any URLs we find. Our browsers are even permitted to see URLs that the search engines are not permitted to see. And on some Websites we see content that the search engines are not permitted to see. This "cloaked" content looks like one thing to a search engine and another to us, but it is served over the same URL.
The Web is a living mechanism, evolving, growing, processing and converting new material. We have not yet developed a science capable of documenting this living mechanism beyond the crude measurements that Webometrics have offered. We simplify our view of the Web by thinking in terms of documents, links, and hosts. We limit our interpretation of the available data by borrowing concepts from network theory. We are still struggling to find the right metaphors to help us understand the abstractions we need to interpret what we see.
There are philosophical questions that have received little attention. For example, does a Website exist? If so, where does it exist? Does it exist only on the server that stores its logical patterns? What if that pattern is scattered across multiple servers? Does it exist in the client machines that connect to the Website (mobile phones, PCs, search engine crawl servers, etc.)? Is the cache image your browser creates part of the Web or just an echo of it? There are links in that cache that your browser follows to connect to other Web content.
We can argue that the Web exists only in the cache files our browsers create and maintain -- except there are live streams of data (audio and video files) being served that are not cached. Is a streamed file part of the Web or is it just a filament of information transported by, through, or upon the Web that is separate from it?
If we do not address these philosophical questions then our attempts to study the Web, to identify the laws that govern its existence and its operations, are flawed. We need the philosophy to tell us what we are studying, to define what is and is not the Web, part of the Web, beyond the Web.
This is my universe -- the universe of questions about the Web, what it is, what it does. It is immense. It is personal. It is ever changing. It is fascinating.
- PHYSICAL SCIENCES
- EARTH SCIENCES
- LIFE SCIENCES
- SOCIAL SCIENCES
Subscribe to the newsletter
Stay in touch with the scientific world!
Know Science And Want To Write?
- Kudos To "The Independent" Newspaper For Debunking Nibiru "Blood Moon" Hoax
- USDA Microbiologist Warns Bacteria In Vaping Products May Be A Health Concern
- Your Microbiome Did Not Cause Your Weight Problem
- 20 Cent School Intervention Stops Unhealthy Weight Gain In Children
- Gödel,Frenkel, Kurzweil, and Hawkins on AI
- Control Cancer By Making The Tumor Cell Environment Hostile
- A Great Blitz Game
- "So your absolutely certain and can absolutely guarantee that its not real mr walker. Charles ..."
- "Great, glad to have helped Michael, thanks :)...."
- "Right. But it can help to pin that down a bit and say what particularly you are worried about...."
- "Interesting Thanks for the helpful information, Robert. You give such insightful and easy to understand..."
- "Okay - there are a lot of urban myths about earthquakes, and these Nibiru people tend to..."
- Breast Cancer: Genomics May Show Where Chemo Might Not Be Needed
- Gallup Poll: Great Example of How to Bias a Social Science Study
- Another Kardashian Craze Debunked
- Fad Friday: Ditch The Body Wrap!
- Commonly Cited Stat of 10 Bacteria for Every 1 Human Cell Is Wrong
- Why The EpiPen And Other Generic Drugs Are So Expensive
- Heart failure in the elderly set to triple by 2060
- Up to 80 percent of heart failure patients denied therapy to reduce hospitalization and death
- Increased risk of death for heart failure patients with each NHS hospital admission
- Nebivolol prevents anthracycline-induced cardiotoxicity
- Traffic accidents increased by 50 percent in patients with implantable cardioverter defibrillator