I have a website saying :
http://domain.com/
mirror site on
http://cdn.domain.com/
I don't want cdn to be indexed. How can I write robots.txt rule to avoid the cdn from being indexed without disturbing my present robots.txt excludes.
My present robots.txt excludes :
User-agent: *
Disallow: /abc.php
How can I avoid cdn.domain.com from being indexed ?
User-agent: *
Disallow: /abc.php
in your root .htaccess file add the following
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Amazon.CloudFront$
RewriteRule ^robots\.txt$ robots-cdn.txt
And then create a separate robots-cdn.txt:
User-agent: *
Disallow: /
When accessed through via http://cdn.domain.com/robots.txt will return the contents of the robots-cdn.txt file... otherwise the rewrite won't kick in and the true robots.txt will kick in.
This way you are free to mirror the entire site (including the .htaccess) file with the expected behavior
Update :
HTTP_USER_AGENT did it since Amazon uses it while querying it from any location.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With