Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent Googlebot from overwhelming site?

I'm running a site with a lot of content, but little traffic, on a middle-of-the-road dedicated server.

Occasionally, Googlebot will stampede us, resulting in Apache maxing out its memory, and causing the server to crash.

How can I avoid this?

like image 574
lo_fye Avatar asked Aug 25 '09 13:08

lo_fye


People also ask

How do I stop Google bots from crawling my site?

You can prevent a page or other resource from appearing in Google Search by including a noindex meta tag or header in the HTTP response. When Googlebot next crawls that page and sees the tag or header, Google will drop that page entirely from Google Search results, regardless of whether other sites link to it.

Can Googlebot crawl my site?

Googlebot can crawl the first 15MB of an HTML file or supported text-based file. Any resources referenced in the HTML such as images, videos, CSS, and JavaScript are fetched separately. After the first 15MB of the file, Googlebot stops crawling and only considers the first 15MB of the file for indexing.


3 Answers

  • register at google webmaster tools, verify your site and throttle google bot down
  • submit a sitemap
  • read the google guildelines: (if-Modified-Since HTTP header)
  • use robot.txt to restrict access from to bot to some parts of the website
  • make a script that changes the robot.txt each $[period of time] to make sure the bot is never able to crawl too many pages at the same time while making sure it can crawl all the content overall
like image 55
Jean Avatar answered Sep 22 '22 06:09

Jean


You can set how your site is crawled using google's webmaster tools. Specifically take a look at this page: Changing Google's crawl rate

You can also restrict the pages that the google bot searches using a robots.txt file. There is a setting available for crawl-delay, but it appears that it is not honored by google.

like image 33
Gavin Miller Avatar answered Sep 24 '22 06:09

Gavin Miller


Register your site using the Google Webmaster Tools, which lets you set how often and how many requests per second googlebot should try to index your site. Google Webmaster Tools can also help you create a robots.txt file to reduce the load on your site

like image 45
Ronny Vindenes Avatar answered Sep 25 '22 06:09

Ronny Vindenes