Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Disallow a mirror site (on sub-domain) using robots.txt? [closed]

I have a website saying :

http://domain.com/

mirror site on

http://cdn.domain.com/

I don't want cdn to be indexed. How can I write robots.txt rule to avoid the cdn from being indexed without disturbing my present robots.txt excludes.

My present robots.txt excludes :

User-agent: *
Disallow: /abc.php

How can I avoid cdn.domain.com from being indexed ?

User-agent: *
Disallow: /abc.php
like image 999
Yugal Jindle Avatar asked Dec 26 '22 02:12

Yugal Jindle


1 Answers

in your root .htaccess file add the following

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Amazon.CloudFront$
RewriteRule ^robots\.txt$ robots-cdn.txt

And then create a separate robots-cdn.txt:

User-agent: *
Disallow: /

When accessed through via http://cdn.domain.com/robots.txt will return the contents of the robots-cdn.txt file... otherwise the rewrite won't kick in and the true robots.txt will kick in.

This way you are free to mirror the entire site (including the .htaccess) file with the expected behavior

Update :

  • HTTP_USER_AGENT did it since Amazon uses it while querying it from any location.
  • I have verified and it works
like image 182
Orangepill Avatar answered Mar 02 '23 01:03

Orangepill