Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

robots.txt: user-agent: Googlebot disallow: / Google still indexing

Look at the robots.txt of this site:

fr2.dk/robots.txt

The content is:

User-Agent: Googlebot
Disallow: /

That ought to tell google not to index the site, no?

If true, why does the site appear in google searches?

like image 489
Anders Avatar asked Jan 22 '11 16:01

Anders


People also ask

Will robots.txt prevent indexing?

It's not possible to use a robots. txt file to prevent Webflow site assets from being indexed because a robots. txt file must live on the same domain as the content it applies to (in this case, where the assets are served).

How do I bypass robots.txt disallow?

If you don't want your crawler to respect robots. txt then just write it so it doesn't. You might be using a library that respects robots. txt automatically, if so then you will have to disable that (which will usually be an option you pass to the library when you call it).


2 Answers

Besides having to wait, because Google's index updates take some time, also note that if you have other sites linking to your site, robots.txt alone won't be sufficient to remove your site.

Quoting Google's support page "Remove a page or site from Google's search results":

If the page still exists but you don't want it to appear in search results, use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page.

One possible alternative solution is also mentioned in above document:

Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it. This is a good solution if you don't have direct access to the site server. (You will need to be able to edit the HTML source of the page).

like image 150
earl Avatar answered Sep 28 '22 02:09

earl


If you just added this, then you'll have to wait - it's not instantaenous - until Googlebot comes back to respider the site and sees the robots.txt, the site'll still be in their database.

I doubt it's relevant, but you might want to change your "Agent" to "agent" - Google's most likely not case sensitive for this, but can't hurt to follow the standard exactly.

like image 45
Marc B Avatar answered Sep 28 '22 03:09

Marc B