I have a website saying :
http://domain.com/
mirror site on
http://cdn.domain.com/
I don't want cdn
to be indexed. How can I write robots.txt
rule to avoid the cdn
from being indexed without disturbing my present robots.txt
excludes.
My present robots.txt
excludes :
User-agent: *
Disallow: /abc.php
How can I avoid cdn.domain.com
from being indexed ?
User-agent: *
Disallow: /abc.php
in your root .htaccess file add the following
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Amazon.CloudFront$
RewriteRule ^robots\.txt$ robots-cdn.txt
And then create a separate robots-cdn.txt:
User-agent: *
Disallow: /
When accessed through via http://cdn.domain.com/robots.txt will return the contents of the robots-cdn.txt file... otherwise the rewrite won't kick in and the true robots.txt will kick in.
This way you are free to mirror the entire site (including the .htaccess) file with the expected behavior
Update :
HTTP_USER_AGENT
did it since Amazon uses it while querying it from any location.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With