How Most Search Engines Work?
Search Engines for the web do not really search the World Wide Web
directly. Each one searches a database of the full text of web pages selected from the billions of web pages out
there residing on servers all over the world. When you search the web using a search engine, like
the top engines, you are always searching a somewhat
stale copy of the real web page. When you click on links provided in a search engine's search results, you
retrieve from the server the current version of the page.
Search engine databases are selected and built by computer robot programs called spiders. Although it is said
they "crawl and propagate" the web in their hunt for pages to include, in truth they stay in one place. They find the pages for
potential inclusion by following the links in the pages they already have in their database (i.e., already "know
about"). They cannot think or type a URL or use judgment to "decide" to go look something up and see what's on
the web about it. Computers are getting more sophisticated all the time, but they are still clueless. Computers
and servers are only as smart as we make them and can only do what we tell them. I want to tell one to clean my
pool :)
If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand
new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by
some human to a search engine, as a request to be included. All search engine companies offer ways to do this -
manually or electronically. We
offer an electronic submission service for $35.00. You may purchase it here.
After spiders find pages, they pass them on to another computer program for "indexing." This program identifies
the text, links, and other content in the page and stores it in the search engine database's files so that the
database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found
if your search matches its content.
Some types of pages and links are excluded from most search engines by policy. Others are excluded because search
engine spiders cannot access them. Pages that are excluded are referred to as the "Invisible Web" -- what you
don't see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than
the visible web.
|
|
|
 |