Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excluding testing subdomain from being crawled by search engines (w/ SVN Repository)

I have:

  • domain.com
  • testing.domain.com

I want domain.com to be crawled and indexed by search engines, but not testing.domain.com

The testing domain and main domain share the same SVN repository, so I'm not sure if separate robots.txt files would work...

like image 481
Eric Avatar asked Jul 18 '11 20:07

Eric


People also ask

How do I hide a subdomain from a search engine?

Regardless of whether it is on a subdomain or not, to prevent a page being indexed you could: --- in the head section of the page, use a meta robots meta tag set to "noindex". OR: --- use an X-ROBOTS-TAG set to "noindex".

Are subdomains crawled by Google?

Google does crawl Sub-Domains which are the pages generating from your main page. But it might create duplicate pages and getting penalty from Google. So you may need to disallow spiders from crawling your page.

How do I Unindex a Google subdomain?

Another way you can remove your subdomain from Google search is via user header x tags – X-Robots-Tag: noindex. This is usually very effective for removing each page in a domain or subdomain. Some webmasters or programmers like this method since it does not alter or change the way you create a page.


2 Answers

1) Create separate robots.txt file (name it robots_testing.txt, for example).

2) Add this rule into your .htaccess in website root folder:

RewriteCond %{HTTP_HOST} =testing.example.com
RewriteRule ^robots\.txt$ /robots_testing.txt [L]

It will rewrite (internal redirect) any request for robots.txt to robots_testing.txt IF domain name = testing.example.com.

Alternatively, do opposite -- rewrite all requests for robots.txt to robots_disabled.txt for all domains except example.com:

RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^robots\.txt$ /robots_disabled.txt [L]
like image 138
LazyOne Avatar answered Oct 26 '22 04:10

LazyOne


testing.domain.com should have it own robots.txt file as follows

User-agent: *
Disallow: /

User-agent: Googlebot
Noindex: /

located at http://testing.domain.com/robots.txt
This will disallow all bot user-agents and as google looks at the Noindex as well we'll just though it in for good measure.

You could also add your sub domain to webmaster tools - block by robots.txt and submit a site removal (though this will be for google only). For some more info have a look at http://googlewebmastercentral.blogspot.com/2010/03/url-removal-explained-part-i-urls.html

like image 45
Stephan Avatar answered Oct 26 '22 04:10

Stephan