• Home
  • SEO Resources
  • Sitemap
  • About SEO Notes
  • Contact us
  • SEO Themes
  •   Subscribe via feeds

Web Crawler – Selection Policy

Posted by seonotes in April 19th 2006  

-->

A Recent study and research have proved that the largest and most efficient search engine can only cover a portion of the content that is publicly available. To sum up, no search engine indexes more than 16% of the web. Since, it has been proved that the crawler downloads just a fraction of the web pages it is compulsory that the downloaded fraction contains the most relevant pages.

The whole process requires a focus of giving priority to Web pages. Function of a pages intrinsic quality, popularity of its URL and links are referred to as the importance of a page.

Many web experts studied the policies of the schedule planning for web crawling carefully. A brief description of these experts and their studies is given below:

a) Cho et al, 1998: conducted the first study on policies for crawling scheduling. The study concluded that the partial page rank strategy becomes better in case the crawler desires to download pages with high page rank. This is for a single domain.
b) Najork and Wiener; 2001: On performing a crawl on as many as 328 million pages with the help of breadth-first ordering, they found that pages with high page rank were captured by the breadth-first crawl. To conclude, important pages have numerous links attached to them from several host and those links will be found early.
c) Abiteboul et al.2003: devised a crawling strategy base on an algorithm called OPIC (on-line page importance computation.
d) Boldi et al, 2004: tested the breadth-first against random ordering and a powerful strategy by using simulation on subsets of the web of 40 million pages from it domain and 100 million pages.
e) Baeza-yates et al: tested numerous crawling strategies by using simulation on two subjects of the web of 3 million pages from cl domain and the gr domain.

Path ascending crawling:

Web Crawlers aim at accumulating as many resources as possible from a website. They accumulate information by downloading the information. In 2004, Cothey introduced a Path ascending crawler. This path crawler had the quality to ascend to every possible path in URL.

This crawler was considered extremely beneficial in tracing isolated resources. It even found out resources that would have not given own any inbound link in regular crawling.

Focused Crawling: In a focused crawling, a function of the similarity of a page to a given query exhibits the significance of a web page for a crawler.

Deep web crawling: There are many pages that cannot be accessed by regular crawlers if there are no links provided to them. These pages are found in the deep web and can be accessed only once the queries are submitted to a database.

Popularity: 8% [?]

Digg it Add to del.icio.us Stumble it No Comment

No Comment

Random Post

  • What are Meta Tags
  • Web Crawler - Parallelization Policy
  • Prevent pages from stealing your link juice
  • Contribute to Wikipedia, help to built quality resource online.
  • Participate in Social Networks
  • Google Analytics – Where are your visitors coming from?
  • Using Google's Webmaster Tools
  • Better Search Engine Placement through a Combination of SEO Strategies
  • Link Building: One Way Linking Strategies
  • Using Advanced Segmentation tools in Google Analytics
Leave Your Comments Below

Please Note: All comments will be hand modified by our authors so any unsuitable comments will be removed and you comments will be appreared after approved

« Canonical URLs

Tags Cloud

2008 advertising article marketing articles article submissions article writing blogs contents contest copywriting crime css design directory submission directory submissions forums google identity theft image optimizations internet key phrase keywords kill spam Link Building linking strategy marketing marketing plan Meta Tags no spam off-page-seo on-page-seo organic seo RSS S.E.O. search engine optimization seo contest SEOcontest2008 SEO Contests seo notes seo tips SMM social bookmarking social marketing social networks website

Featured SEO Articles

Measuring the Effectiveness of your keywords in articles

The heart of SEO after keyword research, is writing articles that target those keywords. This is a very fine line and one that is easy to misread. Far too many people cram keywords into their ...read more

Google Analytics – What is your most valuable content?

We all know that SEO can be a hit and miss game sometimes. Keywords or pages that we thought would be very popular fail to attract attention and sometimes those pages which we thought were ...read more

Google Analytics – Where are your visitors coming from?

As we saw earlier, the visitors tracking module of Google Analytics provides detailed statistics about who is visiting your site and what they are doing there. However, to find out where they came from and ...read more

Search

Categories

  • Link Building (9)
  • Meta Tags (8)
  • Search Engines (18)
  • SEO Contests (8)
  • Web (10)
  • Web Crawlers (5)
  • WordPress Theme Contest (1)
  • seo notes (73)
  • seo tips (11)
  • social bookmarking (1)
  • website development (1)
  • directory submission (1)
  • web hosting (1)
  • domain registration (1)
  • SEO Software (9)

Archives

  • November 2009 (2)
  • October 2009 (3)
  • August 2009 (4)
  • July 2009 (6)
  • June 2009 (6)
  • May 2009 (6)
  • April 2009 (2)
  • February 2009 (3)
  • January 2009 (1)
  • March 2008 (1)
  • February 2008 (13)
  • January 2008 (9)

Pages

  • SEO Resources
  • Sitemap
  • About SEO Notes
  • Contact us

Meta

  • Log in
  • Valid XHTML
  • Valid CSS
  • kabonfootprint

RSS Search Engine Optimization News

    • Search Engine Optimization Firm Customer Magnetism Is Going Green (dBusinessNews.com) March 12, 2010
    • SES NYC Offers Even Advanced Search Engine Marketers (SEM) Something New (PRWeb) March 12, 2010
    • SES NYC Offers Even Advanced Search Engine Marketers (SEM) Something New (PRWeb via Yahoo! News) March 12, 2010
    • Essential Search Engine Optimization Tips to Consider (Turks.US) March 11, 2010
    • Search Engine Optimization - Do-It-Your-Own Strategies To Help Promote Your Site Online (Turks.US) March 11, 2010

Most Commented

  • SEO Spam Tactics to avoid : Blog Comment Spamming (4)
  • Keyword Strategies - Long Term and Short Term (3)
  • Time to say Good Bye readers (3)
  • Using Google Analytics (3)
  • SEO Contests - All you like to know about them. (2)
  • Float well with Search Engines - A repository of useful SEO Notes. (2)
  • Measuring Success in SEO (2)
  • Rank Tracker Software for measuring SEO (2)
  • What are Seo Contests (1)
  • Developing a contest entry (1)

Most Popular

  • How search engines accomplish major tasks assigned to them
  • Custom Web 2.0 (XHTML) Websites? how to get one with a small budget.
  • Winning in SEO Contest 2008 Can be Achieved through Forums
  • Time to say Good Bye readers
  • You Create a concept and smart webmaster's will earn money on it.
  • SEONotes Web Hosting and Domain Registrar reviews
  • Link Building: One Way Linking Strategies
  • Get your profile up on every network or loose your identity.
  • Link Building : Reciprocal Link Neighbors
  • Better Search Engine Placement through a Combination of SEO Strategies

Random Posts

  • Be careful about using meta tags, still lot of things one must avoid.
  • SEO for Bing
  • Web Crawler - Selection Policy
  • Prevent pages from stealing your link juice
  • Contribute to Wikipedia, help to built quality resource online.
  • How to win in the SEO Game
  • All Links are not created Equal
  • Free Rank Checker Tool for Keyword Tracking
  • Canonical URLs
  • Keyword Strategies - Long Term and Short Term
©2006-2010 SEO Notes
Disclaimer: All data and information provided on this site is for informational purposes only. SEO Notes makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information on this site & will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use.All information is provided on an as-is basis.