Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nginx: different robots.txt for alternate domain

Summary

I have a single web app with an internal and external domain pointing at it, and I want a robots.txt to block all access to the internal domain, but allow all access to the external domain.

Problem Detail

I have a simple Nginx server block that I used to proxy to a Django application (see below). As you can see, this server block responds to any domain (due to the lack of the server_name parameter). However, I'm wondering how to mark specific domains such Nginx will serve up a custom robots.txt file for them.

More specifically, say the domains example.com and www.example.com will serve up a default robots.txt file from the htdocs directory. (Since "root /sites/mysite/htdocs" is set and a robots.txt file is located at /sites/mysite/htdocs/robots.txt)

BUT, I also want the domain "example.internal.com" (which refers to the same server as example.com) to have a custom robots.txt file served; I'd like to create a custom robots.txt so google doesn't index that internal domain.

I thought about duplicating the server block and specifying the following in one of the server blocks. And then somehow overriding the robots.txt lookup in that server block.

"server_name internal.example.com;"

But duplicating the whole server block just for this purpose doesn't seem very DRY.

I also thought about maybe using an if statement to check and see if the host header contains the internal domain. And then serving the custom robots.txt file that way. But Nginx says If Is Evil.

What is a good approach for serving up a custom robots.txt file for an internal domain?

Thank you for your help.

Here is a code sample of the server block that I'm using.

upstream app_server {
  server unix:/sites/mysite/var/run/wsgi.socket fail_timeout=0;
}

server {
  listen 80;

  root /sites/mysite/htdocs;    

  location / {
      try_files $uri @proxy_to_app;
  }

  location @proxy_to_app {
     proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
     proxy_set_header X-Forwarded-Protocol $scheme;
     proxy_set_header X-Real-IP $remote_addr;
     proxy_set_header X-Scheme $scheme;
     proxy_set_header Host $http_host;
     proxy_redirect off;
     proxy_pass   http://app_server;
  }
}
like image 625
Joe J Avatar asked Oct 10 '14 22:10

Joe J


1 Answers

You can use map to define a conditional variable. Add this outside your server directive:

map $host $robots_file {
    default robots.txt;
    internal.example.com internal-robots.txt;
}

Then the variable can be used with try_files like this:

server_name internal.example.com;

location = /robots.txt {
    try_files /$robots_file =404;
}

Now you can have two robots.txt files in your root:

robots.txt
internal-robots.txt
like image 193
Cole Tierney Avatar answered Oct 18 '22 18:10

Cole Tierney