It is essential for a search engine to store the information in a useful manner as soon as the spider finds information on web pages.
The accumulated data can be made accessible to the users by including too key components.
a) The information stored with the data.
b) The method by which the information is indexed.
A search engine is designed in a way to store the world and the URL where it was found. Now this would make a search engine accessible to only a limited use. To make a search engine useful for better results, it must store more than just the word and URL. This is what an efficient search engine does.
A search enging might store the number of times that the word appears on a page. The engine may give a task to provide a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-heading, in links, in the meta tags or in the title of the page.
Since each search engine follows a different formula for assigning weight to the words in its index.
This result in to the production of different lists for a search for the same word on different search engines.
The data encodes to save storage space irrespective of the specific combination of additional pieces of information stored by a search engine.
For instance, the Google paper uses 2bytes, of 8bits each to store information on weighting, irrespective of the word was capitalized to help in ranking the hit.
Each factor may take up 2 or 3 bits within the 2-byte grouping (8 bits = 1byte).
This facilitates the storage of a great deal of information in a compact form. The compacted form of information is then ready for indexing.
An index has a sole purpose, that is to allow the information to be traced in a less possible time.
One of the most effective ways to build an index is to build a hash table.
In this process, a formula is applied to attach a numerical value to each word. The formula is designed to evenly distribute the entries across a predetermined number of divisions. This type of distribution is contrast to the distribution of words across the alphabet. This is the key to the success of a hash table.
In English, there are some specific letters that begin many words. For instance, there are many words that begin with the letter “A” as compared to the letter “X”.
Hashing decreases the amount of average time it take to find an entry irrespective of the word typed (”A” or “X”). It even separates the index from the actual entry.
The hash table mainly consists of the hashed number along with a pointer to the actual data, that can be sorted in whichever way allows it to be stored most efficiently.
This is facilitated by the combination of efficient indexing and effective storage, thereby providing quick search results.
Popularity: 4% [?]
Subscribe via feeds
No Comment
Random Post
Leave Your Comments Below