Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to control the crawl speed by robots.txt?

We can tell bots to crawl or not to crawl our website in robot.txt. On the other hand, we can control the crawling speed in Google Webmasters (how much Google bot crawls the website). I wonder if it is possible to limit the crawler activities by robots.txt

I mean accepting bots to crawl pages but limit their presence by time or pages or size!

like image 940
Googlebot Avatar asked Oct 16 '11 20:10

Googlebot


People also ask

How can I change the crawl rate?

If your crawl rate is described as "calculated as optimal," the only way to reduce the crawl rate is by filing a special request. You cannot increase the crawl rate. Otherwise, select the option you want and then limit the crawl rate as desired.

Does robots txt prevent crawling?

txt is to prevent duplicate content issues that occur when the same posts or pages appear on different URLs. Duplicates can negatively impact SEO. The solution is simple – identify duplicate content, and disallow bots from crawling it.

What robots txt tells to crawlers?

A robots. txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.

What is crawl delay in robots txt?

Crawl delay A robots. txt file may specify a “crawl delay” directive for one or more user agents, which tells a bot how quickly it can request pages from a website. For example, a crawl delay of 10 specifies that a crawler should not request a new page more than every 10 seconds.


1 Answers

There is one directive you can use in robots.txt, it's "Crawl-delay".

Crawl-delay: 5

Meaning robots should be crawling no more than one page per 5 seconds. But this directive is not officially supported by robots.txt, as much as I know.

Also there are some robots that don't really take in count robots.txt file at all. So even if you have disallowed access to some pages, they still may get crawled by some robots, of course not the largest ones like Google.

Baidu for example could ignore robots.txt, but that's not for sure.

I've got no official source for this info, so you can just Google it.

like image 156
ZurabWeb Avatar answered Feb 09 '23 00:02

ZurabWeb