how to ban crawler 360Spider with robots.txt or .htaccess?

Tags:

I've got a problems because of 360Spider: this bot makes too many requests per second to my VPS and slows it down (the CPU-usage becomes 10-70%, but usually i have 1-2%). I looked into httpd logs and saw there such lines:

182.118.25.209 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/42957-polovity.html HTTP/1.1" 200 96809 "http://www.hrinchenko.com/slovar/znachenie-slova/42957-polovity.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
182.118.25.208 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/52614-rospryskaty.html HTTP/1.1" 200 100239 "http://www.hrinchenko.com/slovar/znachenie-slova/52614-rospryskaty.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider

etc.

How can I block this spider completely via robots.txt? Now my robots.txt looks like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/

User-agent: YoudaoBot
Disallow: /

User-agent: sogou spider
Disallow: /

I've added lines:

User-agent: 360Spider
Disallow: /

but that does not seem to work. How to block this angry bot?

If you offer to block it via .htaccess, so mind that it looks now like this:

# Turn on URL rewriting
RewriteEngine On

# Installation directory
RewriteBase /

SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them

# Protect hidden files from being viewed
<Files .*>
    Order Deny,Allow
    Deny From All
</Files>

# Protect application and system files from being viewed
RewriteRule ^(?:application|modules|system)\b.* index.php/$0 [L]

# Allow any files or directories that exist to be displayed directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

# Rewrite all other URLs to index.php/URL
RewriteRule .* index.php/$0 [PT]

And, in spite of presence of

SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them

this bot still tries to kill my VPS and is logged in access logs.

929

asked Sep 06 '12 17:09

kovpack

2 Answers

In your .htaccess file simply add the following :

RewriteCond %{REMOTE_ADDR} ^(182\.118\.2)

RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]

This will catch ALL the bots being launched from the 182.118.2xx.xxx range and send them back to themself...

The crappy 360 bot is being fired from servers in China... so as long as you don't mind saying bye bye to crappy Chinese traffic from that IP range, this will guaranteed make those puppies disappear from reaching any files on your web site.

The following two lines in your .htaccess file will also pick it off simply by it being stupid enough to proudly put 360spider in its user agent string. This could be handy for when they use other IP ranges then the 182.118.2xx.xxx

RewriteCond %{HTTP_USER_AGENT} .*(360Spider) [NC]

RewriteRule ^.*$ http://182.118.25.209/take_a_hike_moron [R=301,L]

And yes... I hate them too !

110

answered Nov 04 '22 20:11

Sloth

Your robots.txt seems right. Some bots just ignore it (malicious bots crawl from any IP address from any botnet of hundreds to millions of infected devices from all around the globe), in this case you can limit the number of requests per second using mod_security module for apache 2.X

Config example here: http://blog.cherouvim.com/simple-dos-protection-with-mod_security/

[EDIT] On linux, iptables also allows restricting tcp:port connections per (x) second(s) per ip, providing conntrack capabilities are enabled on your kernel. See: https://serverfault.com/questions/378357/iptables-dos-limit-for-all-ports

answered Nov 04 '22 19:11

NotGaeL

Related questions
                            
                                Wordpress .htaccess www. not forcing
                            
                                .htaccess redirect index.php to /
                            
                                Redirect to public folder on Lumen (Laravel)
                            
                                Redirecting to HTTPS returns error "too many redirects"
                            
                                PHP Rewrite Rules
                            
                                How can I disable RewriteRule for one subcategory?
                            
                                CodeIgniter HTAccess doesn't load CSS files
                            
                                How to change the PHP file extension using .htaccess file on GoDaddy Linux Hosting?
                            
                                301 redirect from URL with query string to new domain with different query string
                            
                                How to redirect html extension using htaccess?
                            
                                Zend Framework - 500 Internal Server Error
                            
                                How to programatically access a password protected website?
                            
                                Login page vs. htpasswd - Which is more secure?
                            
                                Remove query string from redirected URL with htaccess
                            
                                Redirect using htaccess based on referrer
                            
                                .htaccess not working in amazon ec2 ubuntu instance
                            
                                Redirect .com to .org in .htaccess
                            
                                Write a .htaccess file in PHP?
                            
                                redirect all wildcard subdomains to root domain
                            
                                Do HTTP authentication over HTTPS with URL rewriting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to ban crawler 360Spider with robots.txt or .htaccess?

Tags:

.htaccess

search-engine

bots

web-crawler

robots.txt

kovpack

People also ask

2 Answers

Sloth

NotGaeL

Recent Activity

Donate For Us