I have somewhat of a staging server on the public internet running copies of the production code for a few websites. I'd really not like it if the staging sites get indexed.
Is there a way I can modify my httpd.conf on the staging server to block search engine crawlers?
Changing the robots.txt wouldn't really work since I use scripts to copy the same code base to both servers. Also, I would rather not change the virtual host conf files either as there is a bunch of sites and I don't want to have to remember to copy over a certain setting if I make a new site.
Create a robots.txt file with the following contents:
User-agent: *
Disallow: /
Put that file somewhere on your staging server; your directory root is a great place for it (e.g. /var/www/html/robots.txt
).
Add the following to your httpd.conf file:
# Exclude all robots
<Location "/robots.txt">
SetHandler None
</Location>
Alias /robots.txt /path/to/robots.txt
The SetHandler
directive is probably not required, but it might be needed if you're using a handler like mod_python, for example.
That robots.txt file will now be served for all virtual hosts on your server, overriding any robots.txt file you might have for individual hosts.
(Note: My answer is essentially the same thing that ceejayoz's answer is suggesting you do, but I had to spend a few extra minutes figuring out all the specifics to get it to work. I decided to put this answer here for the sake of others who might stumble upon this question.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With