I'm trying to block all bots/crawlers/spiders for a special directory. How can I do that with htaccess
? I searched a little bit and found a solution by blocking based on the user agent:
RewriteCond %{HTTP_USER_AGENT} googlebot
Now I would need more user agents (for all bots known) and the rule should be only valid for my separate directory. I have already a robots.txt but not all crawlers take a look at it ... Blocking by IP address is not an option. Or are there other solutions? I know the password protection but I have to ask first if this would be an option. Nevertheless, I look for a solution based on the user agent.
One option to reduce server load from bots, spiders, and other crawlers is to create a robots. txt file at the root of your website. This tells search engines what content on your site they should and should not index.
You can prevent a page or other resource from appearing in Google Search by including a noindex meta tag or header in the HTTP response. When Googlebot next crawls that page and sees the tag or header, Google will drop that page entirely from Google Search results, regardless of whether other sites link to it.
They can do this by utilizing robots. txt to block common bots that SEO professionals use to assess their competition. For example Semrush and Ahrefs. This will block AhrefsBot from crawling your entire site.
You need to have mod_rewrite enabled. Placed it in .htaccess in that folder. If placed elsewhere (e.g. parent folder) then RewriteRule pattern need to be slightly modified to include that folder name).
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]
Why use .htaccess or mod_rewrite for a job that is specifically meant for robots.txt
? Here is the robots.txt snippet you will need t block a specific set of directories.
User-agent: *
Disallow: /subdir1/
Disallow: /subdir2/
Disallow: /subdir3/
This will block all search bots in directories /subdir1/
, /subdir2/
and /subdir3/
.
For more explanation see here: http://www.robotstxt.org/orig.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With