|
|
How Spider Search Engines Work
You have probably used a search engine such as
Google or Yahoo to find information on the Web. Have you wondered how the search engine's
find the results?
In understanding this, you will begin to discover how to structure your site to obtain optimum strength from search engines.
It's a common misconception that when a user enters a query into a search engine, the search engine searches the
Internet to find pages that match the query.
This perception not 100% correct! The search engine searches its own copy of the Web.
Every search engine creates its own copy of the Web. This copy is called an index. The size of a search engine's index varies from search engine to search engine, but it is always smaller than the World Wide Web as a whole.
A search engine builds a list of pages to add to its index using a special piece of software known as a spider.
The spider crawls across the Web, adding
certain pages it visits to the list of pages to index. The spider is capable of reading a Web page and finding links to other pages to visit. In this way, the spider can travel across the Web finding pages to add to the search engine index.
Some time after a page has been "spidered" (visited by a spider)
the search engine adds a copy of the page to the search engine index. When a user enters a query into a search engine,
the search engine software searches the index to find pages that match the query. It then sorts those pages into ranking order.
Each search engine uses its own "secret recipe" to find and rank pages, but
many base the results on the frequency and location of the search term on the
page and the number of related websites pointing to this page.
That, as briefly as possible, is how search engines work. With that minimal knowledge and a little thought, you can reach the following conclusions:
-
A search engine may not have a copy of every page on your site
-
If a search engine does have a copy of a page on your site, that copy may not be up to date
-
A search engine can have a copy of a page that no longer exists on your site
-
In order for a page on your site to be listed in response to a user query, words in that query must match words in the search engine's copy of the page
|

|