Security: Block bad spiders and bots from access to website using htaccess and HTTP_USER_AGENT

Tags:

When building an htaccess rule to block common spiders and bots, what HTTP_USER_AGENT headers should be filtered?

 RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR]
 RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
 RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
 RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
 RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
 RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
 RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
 RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
 RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
 RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
 RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
 RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
 RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
 RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
 RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
 RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
 RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
 RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
 RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
 RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
 RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
 RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
 RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
 RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
 RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
 RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
 RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
 RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
 RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
 RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
 RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
 RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
 RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
 RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
 RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
 RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
 RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
 RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
 RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
 RewriteCond %{HTTP_USER_AGENT} ^Zeus
 ## Return 403 Forbidden error.
 RewriteRule .* - [F]

737

asked Aug 18 '11 20:08

WooDzu

1 Answers

This is how I do it:

SetEnvIfNoCase user-agent "^BlackWidow" bad_bot 
# and all others from your list
SetEnvIfNoCase user-agent "^Zeus" bad_bot 
<FilesMatch "(.*)">
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</FilesMatch>

answered Nov 13 '22 10:11

Gino Sullivan

Related questions
                            
                                How to uninstall/remove Certbot Let's Encrypt from Debian 8
                            
                                Apache .htaccess <Directory not allowed here
                            
                                pandas write dataframe to parquet format with append
                            
                                How does multithreading affect http keep-alive connection?
                            
                                How can I limit the size of HTTP POST requests in mod_perl?
                            
                                Enable PHP to read .css and .js files while keeping their original Content-Type
                            
                                Last Image in Document Fails to Load - "Failed to Load Resource"
                            
                                How can I address c10k problem if I am using PHP?
                            
                                Apache Shiro & Java Security for Novices
                            
                                How to stop a oversize file upload while upload is in process?
                            
                                Apache mod_rewrite internally to different port
                            
                                fatal error: 'apr.h' file not found when installing x-sendfile mac os x mountain lion server
                            
                                How to configure mod_pagespeed for SSL pages
                            
                                OperationalError: attempt to write a readonly database in ubuntu server
                            
                                (48)Address already in use: make_sock: could not bind to address [::]:80 on OS x Mavericks
                            
                                Apache Virtual Host is not working right [closed]
                            
                                Where do prints go when running Flask with Apache?
                            
                                How to do load testing of a PHP website from both ends
                            
                                Set up apache to alias a nodejs application?
                            
                                Handle multiple versions in same URL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Security: Block bad spiders and bots from access to website using htaccess and HTTP_USER_AGENT

Tags:

redirect

security

apache

.htaccess

user-agent

WooDzu

People also ask

1 Answers

Gino Sullivan

Recent Activity

Donate For Us