Almost realtime indexing by Google search engine
Posted by webstuffscan on August 13th, 2007
In this blog post, Matt Cutts points out the speed with which google indexes new content. He notes that his post itself was indexed in less than 30 minutes! For a search engine which is supposed to crawl the entire internet it is unbelievable and clearly shows why Google is the leader!
This means that Google has such good infrastructure that they can do almost realtime crawling of Websites! But how do they do that? Obviously they can’t crawl all the sites every minute (which will take a lot of bandwidth from individual sites).
The secret probably is with blog pings and sitemaps. Already google blog search, technorati etc. use the pings to update the content. This means that only incremental data is crawled. Another feature that can be used is sitemaps which also will help google in doing incremental crawling. You can also add your blog/website sitemap in google by creating an account at Google webmaster tools.
Let us do some math here. According to Technorati there are over 100 million blogs. Now if we assume 1 post per day and an average size of 10KB per post we have,
Total content = 100 million x 10KB = 1 TeraByte of transfer!
This means that just to crawl blogs using incremental techniques, Google search engine will be using a minimum of 1 Terabyte per day of bandwidth! Here we assumed that only new posts are crawled. In reality, Google has to crawl old posts also to see whether they have changed! In Sitemaps, you can define that as daily/weekly/monthly/yearly crawl.
Check out my sitemap console screenshot below. As you can see, it is possible to reduce the crawl speed to save on the bandwidth. Now in that case you content will not be near “realtime”.

- Google custom search engine!
- Replacing Wordpress search with Google search and adsense
- Boosting Windows Vista Performance - disable search indexing
- Adding sitemaps just became easier - enable auto discovery!
- Rapidshare.de or Rapidshare.com dilemma and rapidshare search tools
