Summary
I have a single web app with an internal and external domain pointing at it, and I want a robots.txt to block all access to the internal domain, but allow all access to the external domain.
Problem Detail
I have a simple Nginx server block that I used to proxy to a Django application (see below). As you can see, this server block responds to any domain (due to the lack of the server_name parameter). However, I'm wondering how to mark specific domains such Nginx will serve up a custom robots.txt file for them.
More specifically, say the domains example.com and www.example.com will serve up a default robots.txt file from the htdocs directory. (Since "root /sites/mysite/htdocs" is set and a robots.txt file is located at /sites/mysite/htdocs/robots.txt)
BUT, I also want the domain "example.internal.com" (which refers to the same server as example.com) to have a custom robots.txt file served; I'd like to create a custom robots.txt so google doesn't index that internal domain.
I thought about duplicating the server block and specifying the following in one of the server blocks. And then somehow overriding the robots.txt lookup in that server block.
"server_name internal.example.com;"
But duplicating the whole server block just for this purpose doesn't seem very DRY.
I also thought about maybe using an if statement to check and see if the host header contains the internal domain. And then serving the custom robots.txt file that way. But Nginx says If Is Evil.
What is a good approach for serving up a custom robots.txt file for an internal domain?
Thank you for your help.
Here is a code sample of the server block that I'm using.
upstream app_server {
server unix:/sites/mysite/var/run/wsgi.socket fail_timeout=0;
}
server {
listen 80;
root /sites/mysite/htdocs;
location / {
try_files $uri @proxy_to_app;
}
location @proxy_to_app {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Protocol $scheme;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_set_header Host $http_host;
proxy_redirect off;
proxy_pass http://app_server;
}
}
You can use map
to define a conditional variable. Add this outside your server directive:
map $host $robots_file {
default robots.txt;
internal.example.com internal-robots.txt;
}
Then the variable can be used with try_files
like this:
server_name internal.example.com;
location = /robots.txt {
try_files /$robots_file =404;
}
Now you can have two robots.txt files in your root:
robots.txt
internal-robots.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With