• Home
  • SEO Resources
  • Sitemap
  • About SEO Notes
  • Contact us
  • SEO Themes
  •   Subscribe via feeds

Web Crawler – Parallelization Policy

Posted by seonotes in April 19th 2006  

-->

A Crawler that runs multiple processes in parallel is a parallel crawler.

It aims towards maximizing the download rate and in this process, it minimizes the overhead from parallelization to prevent repeated downloads of the same page.

The Crawling system requires a policy to avoid downloading the same page twice. The policy assigns the new URLs discovered during the crawling processes.

Dynamic assignment: Dynamic assignment helps a central server to assign new URLs to different crawlers in dynamic manners. This, in a way facilitates the central server to balance the load of each crawler.

The systems can efficiently add or remove down loader processes with the help of dynamic assignment. Since, in this case the central server becomes the bottleneck, it is very essential to transfer a large part of workload to the distributed crawling processes.

There are two main configurations of crawling architectures with dynamic assignments.

These configurations are well described by Shkapenyuk and suel in 2002.

a) A small crawler configuration: In this type of a configuration, there is a central DNS resolver and central queues per website and distributed down loaders.
b) A Larger crawler configuration: In this type of crawler configuration, both the DNS resolver and the queues are distributed.
c) Static assignment: There is a fixed rule in this type of policy. This defines how to assign new URLs to the crawlers.

In this policy, a hashing function can be utilized to transform URLs into a number that matches the index of the related crawling process.

It is necessary to exchange the URLs between crawling processes in batch. This reduces the overhead due to the exchange of URLs between crawling processes.

Three main properties constitute an effective assignment. They are: balancing property, contra-variance property and boldi et al.

Popularity: 6% [?]

Digg it Add to del.icio.us Stumble it No Comment

No Comment

Random Post

  • Notes on On-Page-SEO factors.
  • Notes on Off-Page-SEO factors.
  • Welcome to the New Home of SEONotes.com
  • Marketing is a full time job, there are no short-cuts to success.
  • Contribute to Wikipedia, help to built quality resource online.
  • Don’t loose your interest; remain active with your website.
  • Importance of Inlinking
  • Hiring a professional SEO?
  • SEO can make difference between success and failure of your website.
  • SEONotes Web Hosting and Domain Registrar reviews
Leave Your Comments Below

Please Note: All comments will be hand modified by our authors so any unsuitable comments will be removed and you comments will be appreared after approved

« Welcome to the New Home of SEONotes.com
Overview of SEO software »

Tags Cloud

2008 advertising article marketing articles article submissions article writing blogs contents contest copywriting crime css design directory submission directory submissions forums google identity theft image optimizations internet key phrase keywords kill spam Link Building linking strategy marketing marketing plan Meta Tags no spam off-page-seo on-page-seo organic seo RSS S.E.O. search engine optimization seo contest SEOcontest2008 SEO Contests seo notes seo tips SMM social bookmarking social marketing social networks website

Featured SEO Articles

Measuring the Effectiveness of your keywords in articles

The heart of SEO after keyword research, is writing articles that target those keywords. This is a very fine line and one that is easy to misread. Far too many people cram keywords into their ...read more

Google Analytics – What is your most valuable content?

We all know that SEO can be a hit and miss game sometimes. Keywords or pages that we thought would be very popular fail to attract attention and sometimes those pages which we thought were ...read more

Google Analytics – Where are your visitors coming from?

As we saw earlier, the visitors tracking module of Google Analytics provides detailed statistics about who is visiting your site and what they are doing there. However, to find out where they came from and ...read more

Search

Categories

  • Link Building (9)
  • Meta Tags (8)
  • Search Engines (18)
  • SEO Contests (8)
  • Web (10)
  • Web Crawlers (5)
  • WordPress Theme Contest (1)
  • seo notes (73)
  • seo tips (11)
  • social bookmarking (1)
  • website development (1)
  • directory submission (1)
  • web hosting (1)
  • domain registration (1)
  • SEO Software (9)

Archives

  • November 2009 (2)
  • October 2009 (3)
  • August 2009 (4)
  • July 2009 (6)
  • June 2009 (6)
  • May 2009 (6)
  • April 2009 (2)
  • February 2009 (3)
  • January 2009 (1)
  • March 2008 (1)
  • February 2008 (13)
  • January 2008 (9)

Pages

  • SEO Resources
  • Sitemap
  • About SEO Notes
  • Contact us

Meta

  • Log in
  • Valid XHTML
  • Valid CSS
  • kabonfootprint

RSS Search Engine Optimization News

    • Virtual Assistant Solutions Launches Search Engine Optimization Services (OfficialWire) March 14, 2010
    • SEO Consult Stress The Need For Quality Copywriting In SEO Campaigns (PRWeb) March 14, 2010
    • SEO Consult Stress The Need For Quality Copywriting In SEO Campaigns (PRWeb via Yahoo! News) March 14, 2010
    • SMX Advanced Search Engine Marketing Expo for experienced marketers (Pandia) March 13, 2010
    • Search Engine Optimization Firm Customer Magnetism Is Going Green (dBusinessNews.com) March 12, 2010

Most Commented

  • SEO Spam Tactics to avoid : Blog Comment Spamming (4)
  • Keyword Strategies - Long Term and Short Term (3)
  • Time to say Good Bye readers (3)
  • Using Google Analytics (3)
  • SEO Contests - All you like to know about them. (2)
  • Float well with Search Engines - A repository of useful SEO Notes. (2)
  • Measuring Success in SEO (2)
  • Rank Tracker Software for measuring SEO (2)
  • What are Seo Contests (1)
  • Developing a contest entry (1)

Most Popular

  • How search engines accomplish major tasks assigned to them
  • Custom Web 2.0 (XHTML) Websites? how to get one with a small budget.
  • Winning in SEO Contest 2008 Can be Achieved through Forums
  • Time to say Good Bye readers
  • You Create a concept and smart webmaster's will earn money on it.
  • SEONotes Web Hosting and Domain Registrar reviews
  • Link Building: One Way Linking Strategies
  • Get your profile up on every network or loose your identity.
  • Link Building : Reciprocal Link Neighbors
  • Better Search Engine Placement through a Combination of SEO Strategies

Random Posts

  • Welcome to the New Home of SEONotes.com
  • Finding a niche product you can sell.
  • All Links are not created Equal
  • Wordpress - All in one SEO plugin
  • How to win in the SEO Game
  • How Content Changes Affect SEO
  • You Create a concept and smart webmaster's will earn money on it.
  • P.P.C. : Pay per click advertising.
  • Seeing Ads on Videos: A Google Move
  • SEO Spam Tactics to avoid : Blog Comment Spamming
©2006-2010 SEO Notes
Disclaimer: All data and information provided on this site is for informational purposes only. SEO Notes makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information on this site & will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use.All information is provided on an as-is basis.