Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deny access but allow robots i.e. Google to sitemap.xml

Is there a method where you can only allow robots such as Google, Yahoo, or other search engine robots to my sitemap which is located at http://www.mywebsite.com/sitemap.xml. Is this possible to not allow direct access by a user but only to robots?

like image 273
MacMac Avatar asked Jul 04 '11 09:07

MacMac


2 Answers

Well basically no, but you could do something with the user-agent string and disallow access (assuming Apache)

<Location /sitemap.xml>
  SetEnvIf User-Agent GodBot GoAway=1
  Order allow,deny
  Allow from all
  Deny from env=!GoAway
</Location>

But as it says here (where I found the syntax)

Warning:

Access control by User-Agent is an unreliable technique, since the User-Agent header can be set to anything at all, at the whim of the end user.

like image 59
Mr Shark Avatar answered Sep 20 '22 08:09

Mr Shark


It is in Red in my source:

$ip = $_SERVER["REMOTE_PORT"];
$host = gethostbyaddr($ip);

if(strpos($host, ".googlebot.com") !== false){
    readfile("sitemap.xml");
}else{
    header("Location: /");

like image 32
Mileen Coulter Avatar answered Sep 22 '22 08:09

Mileen Coulter