Robots.txt - What is the proper format for a Crawl Delay for multiple user agents?

Tags:

Below is a sample robots.txt file to Allow multiple user agents with multiple crawl delays for each user agent. The Crawl-delay values are for illustration purposes and will be different in a real robots.txt file.

I have searched all over the web for proper answers but could not find one. There are too many mixed suggestions and I do not know which is the correct / proper method.

Questions:

(1) Can each user agent have it's own crawl-delay? (I assume yes)

(2) Where do you put the crawl-delay line for each user agent, before or after the Allow / Dissallow line?

(3) Does there have to be a blank like between each user agent group.

References:

http://www.seopt.com/2013/01/robots-text-file/

http://help.yandex.com/webmaster/?id=1113851#1113858

Essentially, I am looking to find out how the final robots.txt file should look using the values in the sample below.

Thanks in advance.

# Allow only major search spiders    
User-agent: Mediapartners-Google
Disallow:
Crawl-delay: 11

User-agent: Googlebot
Disallow:
Crawl-delay: 12

User-agent: Adsbot-Google
Disallow:
Crawl-delay: 13

User-agent: Googlebot-Image
Disallow:
Crawl-delay: 14

User-agent: Googlebot-Mobile
Disallow:
Crawl-delay: 15

User-agent: MSNBot
Disallow:
Crawl-delay: 16

User-agent: bingbot
Disallow:
Crawl-delay: 17

User-agent: Slurp
Disallow:
Crawl-delay: 18

User-agent: Yahoo! Slurp
Disallow:
Crawl-delay: 19

# Block all other spiders
User-agent: *
Disallow: /

# Block Directories for all spiders
User-agent: *
Disallow: /ads/
Disallow: /cgi-bin/
Disallow: /scripts/

(4) If I want to set all of the user agents to have crawl delay of 10 seconds, would the following be correct?

# Allow only major search spiders
User-agent: *
Crawl-delay: 10

User-agent: Mediapartners-Google
Disallow:

User-agent: Googlebot
Disallow:

User-agent: Adsbot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: Googlebot-Mobile
Disallow:

User-agent: MSNBot
Disallow:

User-agent: bingbot
Disallow:

User-agent: Slurp
Disallow:

User-agent: Yahoo! Slurp
Disallow:

# Block all other spiders
User-agent: *
Disallow: /

# Block Directories for all spiders
User-agent: *
Disallow: /ads/
Disallow: /cgi-bin/
Disallow: /scripts/

550

asked Jun 29 '13 07:06

Sammy

1 Answers

(1) Can each user agent have it's own crawl-delay?

Yes. Each record, started by one or more User-agent lines, can have a Crawl-delay line. Note that Crawl-delay is not part of the original robots.txt specification. But it’s no problem to include them for those parsers that understand it, as the spec defines:

Unrecognised headers are ignored.

So older robots.txt parsers will simply ignore your Crawl-delay lines.

(2) Where do you put the crawl-delay line for each user agent, before or after the Allow / Dissallow line?

Doesn’t matter.

(3) Does there have to be a blank like between each user agent group.

Yes. Records have to be separated by one or more new lines. See the original spec:

The file consists of one or more records separated by one or more blank lines (terminated by CR,CR/NL, or NL).

(4) If I want to set all of the user agents to have crawl delay of 10 seconds, would the following be correct?

No. Bots look for records that match their user-agent. Only if they don’t find a record, they will use the User-agent: * record. So in your example all the listed bots (like Googlebot, MSNBot, Yahoo! Slurp etc.) will have no Crawl-delay.

Also note that you can’t have several records with User-agent: *:

If the value is '*', the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.

So parsers might look (if no other record matched) for the first record with User-agent: * and ignore the following ones. For your first example that would mean that URLs beginning with /ads/, /cgi-bin/ and /scripts/ are not blocked.

And even if you have only one record with User-agent: *, those Disallow lines are only for bots that have no other record match! As your comment # Block Directories for all spiders suggest, you want these URL paths to be blocked for all spiders, so you’d have to repeat the Disallow lines for every record.

answered Sep 28 '22 11:09

unor

Related questions
                            
                                Reinitialize an Angular.js controller
                            
                                Why standard java classes's clone() return Object instead of actual type
                            
                                Parse string containing numbers into integer array
                            
                                how to read FormData object in php
                            
                                What algorithm is R using to calculate mean?
                            
                                LINQ sort a flat list based on childorder
                            
                                Difference between OpenMP threadprivate and private
                            
                                What is the big difference between modular and object oriented programming?
                            
                                C++11 memory model and accessing different members of the same struct in different threads
                            
                                Is taking the address of a local variable a constant expression in C++11?
                            
                                Since we have snprintf, why we don't have a snscanf?
                            
                                Jersey 2.*. How to replace InjectableProvider and AbstractHttpContextInjectable of Jersey 1.*

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With