Let's say I have a test folder (test.domain.com) and I don't want the search engines to crawl in it, do I need to have a robots.txt in the test folder or can I just place a robots.txt in the root, then just disallow the test folder?
Robots.txt by subdomain and protocol I just mentioned above that Google handles robots. txt files by subdomain and protocol. For example, a site can have one robots. txt file sitting on the non-www version, and a completely different one sitting on the www version.
use disallow on robots. txt file. (But remember it is your subdomain so robots. txt file should be on sub-domain don't use diallow: / for your main domain otherwise those will also noindex.)
A robots. txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page.
One of the most common and useful ways to use your robots. txt file is to limit search engine bot access to parts of your website. This can help maximize your crawl budget and prevent unwanted pages from winding up in the search results.
Each subdomain is generally treated as a separate site and requires their own robots.txt file.
If your test folder is configured as a virtual host, you need robots.txt in your test folder as well. (This is the most common usage).
But if you move your web traffic from subdomain via .htaccess
file, you could modify it to always use robots.txt from the root of your main domain.
Anyway - from my experience it's better to be safe than sorry and put (especially declining access) files robots.txt in all domains you need to protect. And double-check if you're getting the right file when accessing:
http://yourrootdomain.com/robots.txt
http://subdomain.yourrootdomain.com/robots.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With