Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BingBot & BaiduSpider don't respect robots.txt

After my CPU usage suddenly went over 400% due to bots swamping my site, I created a robots.txt as followed and placed the file in my root, eg "www.example.com/":

User-agent: *
Disallow: /

Now Google respects this file and there is no more occurence in my log file of Google. However BingBot & BaiduSpider still show up in my log (and plentyful).

As I had this huge increase in CPU usage & also bandwith and my hosting provider was about to suspend my account, I firstly deleted all my pages (in case there was a nasty script), uploaded clean pages, blocked all bots via IP address in .htaccess & then created that robots.txt file.

I searched everywhere to confirm that I did the right steps (haven't tried the "ReWrite" option in .htaccess yet).

Can anyone confirm that what I have done should do the job? (Since I started this venture, my CPU usage went down to 120% within 6 days, but at least blocking the IP addresses should have brought down the CPU usage to my usual 5-10%).

like image 600
Richard Avatar asked Jul 10 '12 23:07

Richard


1 Answers

If these are legitimate spiders from Bingbot and Baiduspider then they should both honour your robots.txt file as given. However, it can take time before they pick it up and start acting on it if these files have previously been indexed - which is probably the case here.

It doesn't apply in this instance, but it should be noted that Baiduspider's interpretation of the robots.txt standard is a little different to other mainstream bots (ie. Googlebot) in some respects. For instance, whilst the standard defines the URL path on the Disallow: record simply as a prefix, the Baiduspider will only match whole directory/path names. Where the Googlebot will match the URL http://example.com/private/ when given the directive Disallow: /priv, the Baiduspider will not.

Reference:
http://www.baidu.com/search/robots_english.html

like image 56
MrWhite Avatar answered Oct 11 '22 06:10

MrWhite