Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does robots.txt apply to subdomains?

Tags:

robots.txt

Let's say I have a test folder (test.domain.com) and I don't want the search engines to crawl in it, do I need to have a robots.txt in the test folder or can I just place a robots.txt in the root, then just disallow the test folder?

like image 397
Pa3k.m Avatar asked Nov 28 '13 01:11

Pa3k.m


People also ask

Does robots.txt work on subdomains?

Robots.txt by subdomain and protocol I just mentioned above that Google handles robots. txt files by subdomain and protocol. For example, a site can have one robots. txt file sitting on the non-www version, and a completely different one sitting on the www version.

How do robots block subdomains?

use disallow on robots. txt file. (But remember it is your subdomain so robots. txt file should be on sub-domain don't use diallow: / for your main domain otherwise those will also noindex.)

What all can we include in robots.txt file?

A robots. txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.

When should you use a robots.txt file?

One of the most common and useful ways to use your robots. txt file is to limit search engine bot access to parts of your website. This can help maximize your crawl budget and prevent unwanted pages from winding up in the search results.


2 Answers

Each subdomain is generally treated as a separate site and requires their own robots.txt file.

like image 75
malexander Avatar answered Nov 27 '22 13:11

malexander


If your test folder is configured as a virtual host, you need robots.txt in your test folder as well. (This is the most common usage). But if you move your web traffic from subdomain via .htaccess file, you could modify it to always use robots.txt from the root of your main domain.

Anyway - from my experience it's better to be safe than sorry and put (especially declining access) files robots.txt in all domains you need to protect. And double-check if you're getting the right file when accessing:

http://yourrootdomain.com/robots.txt
http://subdomain.yourrootdomain.com/robots.txt
like image 22
Kleskowy Avatar answered Nov 27 '22 14:11

Kleskowy