Imagine you are living within a bubble. Your bubble stretches in any direction but you are always contained within it. Your bubble is adrift in a sea that has no bottom, only a surface. The surface surrounds your bubble, so the sea itself is only a bubble. Beyond the surface lies darkness.
This is the virtual universe we live in every day. We call it the Internet. The bubble around you is the collection of Websites you read and comment on. The sea through which your bubble drifts is the Searchable Web. The darkness beyond the Searchable Web is everything else -- everything that you cannot get to, or which has no meaning to you.
Millions of people view the Searchable Web through the lens of Google. Millions of other people use the lens of Yandex (the leading Russian-language search engine). Millions more use the lens of Baidu (the leading Chinese-language search engine).
Through any one of these lenses you may see a virtual universe of Websites but you won't see the same universe from all three. Yandex and Baidu both index non-core language Websites. Yandex made a very public entrance into the English Web, for example; Baidu has made a less public entrance into that Web.
How large is the Internet? How large is the World Wide Web? How large is the Searchable Web? Several studies have attempted to answer these questions but their methods and estimates are obsolete. The Web is such a dynamic environment that it cannot be measured.
Imagine that the speed of light is not a limit to our ability to observe the universe. Imagine that we can see as much of the universe as we think we have mapped in real-time. Now imagine that every day whole galaxies vanish and whole galaxies appear in different parts of the universe. Some of these galaxies allow us to look into them; other galaxies obscure their stars from outside observation.
In this chaotic real-time universe where things pop into and out of existence other things are morphing. What was a spiral galaxy yesterday may morph into a protogalaxy -- a mass of unshaped gas -- today. A protogalaxy may suddenly morph into a minor young galaxy. And the filaments of gravity and dark matter that connect the galaxies and protogalaxies are constantly shifting, changing.
That is the World Wide Web. It operates according to its own set of laws. There are real principles of cause and effect, true limits to what can be accomplished. The Web is a virtual universe that exists and functions and lives like a universe within the boundaries of the machines that we connect together through the Internet protocol.
I have participated in attempts to estimate the size of the Web, or portions of the Web. It's impossible to know at any one time how many servers and clients are connected to the Internet but there is a finite limit. We know the limit is real but we cannot find it. It is impossible to know at any one time how many Websites are hosted on the connected servers even though there is a real limit to that number, and we cannot find that limit, either.
Using a search engine to estimate the size of the Web is equivalent to using a pie plate to hold all the food in a restaurant. You can do it. You just cannot place all the food there at once. You can see all of the Web through a search engine; you just cannot see it all at once.
Not only do Websites morph, vanish, and burst into existence continually, search engines are continually filtering their data, changing their definitions of what constitutes discrete Web content, and limiting the information they share with searchers.
To date, search engines have publicly claimed to have crawled about 1,000,000,000,000 URLs. Many of those URLs were really ghost images -- duplicate URLs served with session IDs, or presented through alternate taxonomies, or otherwise generated dynamically. The 1 trillion number is probably only a minority fraction of the whole Web and perhaps as much as 10-15% of those claimed URLs have already vanished.
We cannot find all these URLs but we can almost instantly connect to any URLs we find. Our browsers are even permitted to see URLs that the search engines are not permitted to see. And on some Websites we see content that the search engines are not permitted to see. This "cloaked" content looks like one thing to a search engine and another to us, but it is served over the same URL.
The Web is a living mechanism, evolving, growing, processing and converting new material. We have not yet developed a science capable of documenting this living mechanism beyond the crude measurements that Webometrics have offered. We simplify our view of the Web by thinking in terms of documents, links, and hosts. We limit our interpretation of the available data by borrowing concepts from network theory. We are still struggling to find the right metaphors to help us understand the abstractions we need to interpret what we see.
There are philosophical questions that have received little attention. For example, does a Website exist? If so, where does it exist? Does it exist only on the server that stores its logical patterns? What if that pattern is scattered across multiple servers? Does it exist in the client machines that connect to the Website (mobile phones, PCs, search engine crawl servers, etc.)? Is the cache image your browser creates part of the Web or just an echo of it? There are links in that cache that your browser follows to connect to other Web content.
We can argue that the Web exists only in the cache files our browsers create and maintain -- except there are live streams of data (audio and video files) being served that are not cached. Is a streamed file part of the Web or is it just a filament of information transported by, through, or upon the Web that is separate from it?
If we do not address these philosophical questions then our attempts to study the Web, to identify the laws that govern its existence and its operations, are flawed. We need the philosophy to tell us what we are studying, to define what is and is not the Web, part of the Web, beyond the Web.
This is my universe -- the universe of questions about the Web, what it is, what it does. It is immense. It is personal. It is ever changing. It is fascinating.
- PHYSICAL SCIENCES
- EARTH SCIENCES
- LIFE SCIENCES
- SOCIAL SCIENCES
Subscribe to the newsletter
Stay in touch with the scientific world!
Know Science And Want To Write?
- How A Former Naturopath Can Help Unravel The Trickery of Alternative Medicine
- A Billion Years Ago, What Did Earth's Ancient Magnetic Field Look Like?
- Nanotech: The Most Dangerous Science Least Carefully Done
- Can A New Rule Trigger A Second EU Referendum? Petition Signatures Over 11% Of Total Votes Cast
- Finding All-Hadronic Top - Again
- Insects Were Already Using Camouflage 100 Million Years Ago
- Heading To The Hospital? Even With Insurance, It May Cost $1,000 Or More, Study Finds
- "so in a nutshell basically they're not taking worries seriously..."
- "Sentence makes perfect sense. Has it been fixed?..."
- "If its based on signature on rocks, then the hypothesis is wrong. Because rocks form from molten..."
- "You should proof read. The very first sentence makes no sense. Didn't bother reading the rest. ..."
- "Thanks for your understanding!Cheers,T...."
- Got Zika? Thank An Environmentalist
- Magical Moron Moments: Burn Your Feet with Tony Robbins
- IARC is controversial – because they put ideology over science
- Congressman Bob Gibbs: Biotechnology is feeding millions
- Science For The Win: Pepsi Does The Walk Of Shame Back To Aspartame
- Help! My Smartwatch Is Nagging Me!
- Huge helium discovery 'safeguards future supply for MRI scanners'
- Physical activity boosts kids' brain power and academic prowess
- Europe: Don't adopt Australian style immigration system, warn ethicists
- Huge helium discovery 'a life-saving find'
- Threshold for pre-emptive surgery to curb ovarian cancer risk should be halved