Let us discuss further how search engines accomplish major tasks assigned to them and how do they actually arrange the pieces of the Internet together for the final result.

Now if you observe the web closely, you would find Google, overtune, Inktomi, Look smart and find what among the top 5 search engines with Google standing on top enjoying the highest number of searches (250 million) per day.

However in earlier days (Late 1980s) Search engines such as Gopher Arctie, Veronica etc were given importance for valuable information on the net. Today the search engine programs have been replaced by web based search engines.

The ‘Head Start’:

A search engine need to find the information about the file or document you are searching for since there are hundreds of millions of web pages with the required information, a search engine employs special software robots, called spiders. These robots are expert in building lists of the words found on websites. This process is known as Web Crawling. The process of building and maintaining a list of useful words includes researching a lot of pages.

The spider starts its journey over the web with the lists of most frequently used severs and popular pages. The spider chooses the most popular side index the words on its pages and follows every link found within the site.

A spider mainly takes into consideration, the content of a popular web page and creates key search words that facilitate the users to find pages they are actually looking for.

If we have a peep in to the history of Google.com, we would find that it started its journey as an academic search engine initially. The initial system was designed in such a manner to use multiple spiders at a time.

Each spider had an ability to keep 300 connections to web pages open at a time. Their best performance included four spiders in to the task to open 100 web pages per second, generating around 600 kilobytes of data per second.

The process meant constructing a system that is capable to feed apt and adequate information to the spiders. The Google system, in earlier days was equipped with server that provided URLs to the spiders.

Google had its own domain name server (DNS) that translates a server’s name into an address in order to thwart delays.

The Google spider took note of two main things at an HTML page.

a) The words within the page
b) The location of the words.

Words found in the title, subtitles, Meta tags and other such important positions, were given great importance by the Google during a search made by the user.

The Google was indexed in a manner to index each important word on a page, excluding the articles “a”, “an” & “the” other spiders take different approaches.

These different approaches generally make the spider work in a slow pace and allow a more efficient search to the users.

For instance, some spiders will track down the words in the title, sub-heading and links, along with the 100 most frequently and commonly used words on the page and each word in the first 20 lines of text. Lycos is most efficient in using this approach to spider the web.

If you compare systems like Alta vista, they index each word including the articles “a”, “an” & “the”.

Popularity: 6% [?]