• Home
  • SEO Resources
  • Sitemap
  • About SEO Notes
  • Contact us
  • SEO Themes
  •   Subscribe via feeds

Crawler Revisit Policy

Posted by seonotes in April 19th 2006  

-->

Gifted with a powerful and dynamic nature, the process of web crawling may consume a good amount of time (months or weeks). Many events such as creations, updates and deletions occur during the process of web crawling. According to a search engine, there is a cost related to not deleting an event. Freshness and age are the most used cost functions.

Freshness: This indicates the accuracy of the local copy.

The formula depicted below defines the freshness of a page ‘S’ in the repository at time ‘t’.

Fs (t) = {1 If S is equal to the local copy at time t.

0 Otherwise.}

Age: indicates how outdated the local copy is.

The age of page ‘K’ in the repository, at time ‘B’ is defined by the formula given below.

A k (b) = {O If k is not modified at time b

B – modification time of b otherwise.}

Evolution of freshness and age in web crawling:

Edward G. Coffman used a different wording to define the objective of a web crawler that is equivalent to freshness. The analyzation resulted in a conclusion that a crawler must minimize the fraction of time pages remain outdated. They observed that the problem of web crawling can be modeled as a multiple-queue, single-server polling system, on which the web crawler is the server and the website are the queues.

The arrival of the customers are considered as page modifications and switchover times are the interval between page accesses to a single website. The waiting time for a customer in the polling system is equivalent to the average age for the web crawler.

Cho and Garcia Molina in 2003 studied two simple re-visiting policies as depicted below:

a) Uniform policy: The policy involves revisiting all pages in the collection with the same frequency, irrespective of their rates of change.
b) Proportional policy: The Policy includes the process of re-visiting more often the pages that change frequently.

Cho and gracia-Molina came out with an astounding result. They declared that the uniform policy out-performs the proportional policy in both a real web and stimulated web crawl. The explanation was based on the fact that frequent page changes wastes the crawler’s time by trying to re-crawl it too fast.

They emphasized on penalizing the elements that frequently change, to enhance freshness.

This is known as the optimal re-visiting policy that can neither be categorized under uniform policy nor proportional policy.

This is actually the best policy to keep the average freshness. The policy includes ignoring the pages that changes too often.

In this case the optimal is closer to the uniform policy. The revisiting policies here treat all pages as homogenous in terms of quality.

Politeness Policy: Koster found that using web robots is beneficial for a number of tasks. This accompanies a price to be paid for the general community.

The Costs include:

a) Network Resources: Robots require a considerable bandwidth.
b) Server overload: When the frequency of accesses to a given server is high.
c) Poorly written robots: can crash servers or routers.
d) Disrupted network and web servers: If too many users deplay personal robots.

Robots.txt protocol is a solution to the above-mentioned problems.

Popularity: 7% [?]

Digg it Add to del.icio.us Stumble it No Comment

No Comment

Random Post

  • Optimizing the Images using "ALT" Tags
  • Optimizing your site with dynamic URL's
  • Define quality with relationship to your web marketing strategy.
  • RSS and Twitter for Bloggers
  • PPC Campaigns
  • Link Building : Reciprocal Link Neighbors
  • Link Building: One Way Linking Strategies
  • Online branding is all about how you present your contents.
  • Incoming links means approval rating gains for your website, but be careful please.
  • Hiring a professional SEO?
Leave Your Comments Below

Please Note: All comments will be hand modified by our authors so any unsuitable comments will be removed and you comments will be appreared after approved

« Try to gain legitimate One Way Links.
Marketing is a full time job, there are no short-cuts to success. »

Tags Cloud

2008 advertising article marketing articles article submissions article writing blogs contents contest copywriting crime css design directory submission directory submissions forums google identity theft image optimizations internet key phrase keywords kill spam Link Building linking strategy marketing marketing plan Meta Tags no spam off-page-seo on-page-seo organic seo RSS S.E.O. search engine optimization seo contest SEOcontest2008 SEO Contests seo notes seo tips SMM social bookmarking social marketing social networks website

Featured SEO Articles

Measuring the Effectiveness of your keywords in articles

The heart of SEO after keyword research, is writing articles that target those keywords. This is a very fine line and one that is easy to misread. Far too many people cram keywords into their ...read more

Google Analytics – What is your most valuable content?

We all know that SEO can be a hit and miss game sometimes. Keywords or pages that we thought would be very popular fail to attract attention and sometimes those pages which we thought were ...read more

Google Analytics – Where are your visitors coming from?

As we saw earlier, the visitors tracking module of Google Analytics provides detailed statistics about who is visiting your site and what they are doing there. However, to find out where they came from and ...read more

Search

Categories

  • Link Building (9)
  • Meta Tags (8)
  • Search Engines (18)
  • SEO Contests (8)
  • Web (10)
  • Web Crawlers (5)
  • WordPress Theme Contest (1)
  • seo notes (73)
  • seo tips (11)
  • social bookmarking (1)
  • website development (1)
  • directory submission (1)
  • web hosting (1)
  • domain registration (1)
  • SEO Software (9)

Archives

  • November 2009 (2)
  • October 2009 (3)
  • August 2009 (4)
  • July 2009 (6)
  • June 2009 (6)
  • May 2009 (6)
  • April 2009 (2)
  • February 2009 (3)
  • January 2009 (1)
  • March 2008 (1)
  • February 2008 (13)
  • January 2008 (9)

Pages

  • SEO Resources
  • Sitemap
  • About SEO Notes
  • Contact us

Meta

  • Log in
  • Valid XHTML
  • Valid CSS
  • kabonfootprint

RSS Search Engine Optimization News

    • Balancing SEO With Good Content (OfficialWire) March 11, 2010
    • Medium Blue Search Engine Marketing Is Named a Finalist for Two Prestigious American Marketing Association Awards (PRWeb) March 11, 2010
    • How To Use Video SEO To Jump To The Top Of Google Search Results (TechCrunch) March 10, 2010
    • Delta Hotels and Resorts Selects Powered by Search Inc as National Local SEO Agency of Record (PRWeb via Yahoo! News) March 10, 2010
    • Business Professionals Invited to Learn How to Use Search Engine Marketing (PRWeb via Yahoo! News) March 10, 2010

Most Commented

  • SEO Spam Tactics to avoid : Blog Comment Spamming (4)
  • Keyword Strategies - Long Term and Short Term (3)
  • Time to say Good Bye readers (3)
  • Using Google Analytics (3)
  • SEO Contests - All you like to know about them. (2)
  • Float well with Search Engines - A repository of useful SEO Notes. (2)
  • Measuring Success in SEO (2)
  • Rank Tracker Software for measuring SEO (2)
  • What are Seo Contests (1)
  • Developing a contest entry (1)

Most Popular

  • How search engines accomplish major tasks assigned to them
  • Custom Web 2.0 (XHTML) Websites? how to get one with a small budget.
  • Winning in SEO Contest 2008 Can be Achieved through Forums
  • Time to say Good Bye readers
  • You Create a concept and smart webmaster's will earn money on it.
  • SEONotes Web Hosting and Domain Registrar reviews
  • Link Building: One Way Linking Strategies
  • Get your profile up on every network or loose your identity.
  • Link Building : Reciprocal Link Neighbors
  • Better Search Engine Placement through a Combination of SEO Strategies

Random Posts

  • Don’t loose your interest; remain active with your website.
  • Prevent pages from stealing your link juice
  • Building the index
  • Measuring Success in SEO
  • Overview of SEO software
  • Finding a niche product you can sell.
  • You Create a concept and smart webmaster's will earn money on it.
  • Simple and specific website contents, good for humans and rules the SE's.
  • Incoming links means approval rating gains for your website, but be careful please.
  • Monetize your site with Google Adsense.
©2006-2010 SEO Notes
Disclaimer: All data and information provided on this site is for informational purposes only. SEO Notes makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information on this site & will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use.All information is provided on an as-is basis.