I have www.domainname.com, origin.domainname.com pointing to the same codebase. Is there a way, I can prevent all urls of basename origin.domainname.com from getting indexed.
Is there some rule in robot.txt to do it. Both the urls are pointing to the same folder. Also, I tried redirecting origin.domainname.com to www.domainname.com in htaccess file but it doesnt seem to work..
If anyone who has had a similar kind of problem and can help, I shall be grateful.
Thanks
You can prevent a page or other resource from appearing in Google Search by including a noindex meta tag or header in the HTTP response. When Googlebot next crawls that page and sees the tag or header, Google will drop that page entirely from Google Search results, regardless of whether other sites link to it.
Exclude sites from your search engine:Click Add under Sites to exclude. Enter the URL you want to exclude and select whether you want to include any pages that match or only that specific page. See the table below for explanations if you aren't sure which one you want. Click Save.
When you tick “Discourage search engines from indexing this site,” WordPress modifies your robots. txt file (a file that gives instructions to spiders on how to crawl your site). It can also add a meta tag to your site's header that tells Google and other search engines not to index any content on your entire site.
Blocking Search Engines with Meta Tags. Understand HTML robots meta tags. The robots meta tag allows programmers to set parameters for bots, or search engine spiders. These tags are used to block bots from indexing and crawling an entire site or just parts of the site.
You can rewrite robots.txt
to an other file (let's name this 'robots_no.txt' containing:
User-Agent: *
Disallow: /
(source: http://www.robotstxt.org/robotstxt.html)
The .htaccess file would look like this:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.example.com$
RewriteRule ^robots.txt$ robots_no.txt
Use customized robots.txt for each (sub)domain:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^sub.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^example.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.example.org$ [OR]
RewriteCond %{HTTP_HOST} ^example.org$
# Rewrites the above (sub)domains <domain> to robots_<domain>.txt
# example.org -> robots_example.org.txt
RewriteRule ^robots.txt$ robots_${HTTP_HOST}.txt [L]
# in all other cases, use default 'robots.txt'
RewriteRule ^robots.txt$ - [L]
Instead of asking search engines to block all pages on for pages other than www.example.com
, you can use <link rel="canonical">
too.
If http://example.com/page.html
and http://example.org/~example/page.html
both point to http://www.example.com/page.html
, put the next tag in the <head>
:
<link rel="canonical" href="http://www.example.com/page.html">
See also Googles article about rel="canonical"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With