Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why google finds a page excluded by robots.txt?

i'm using robots.txt to exclude some pages from spiders.

User-agent: * 
Disallow: /track.php

When i search something refeered to this page, google says: "A description for this result is not available because of this site's robots.txt – learn more."

It means that the robots.txt is working.. but why the link to the page is still found by the spider? I'd like to have no link to the 'track.php' page... how i should setup the robots.txt? (or something like .htaccess and so on..?)

like image 869
Alberto Fecchi Avatar asked Nov 19 '25 11:11

Alberto Fecchi


1 Answers

Here's what happened:

  • Googlebot saw, on some other page, a link to track.php. Let's call that page "source.html".
  • Googlebot tried to visit your track.php file.
  • Your robots.txt told Googlebot not to read the file.

So Google knows that source.html links to track.php, but it doesn't know what track.php contains. You didn't tell Google not to index track.php; you told Googlebot not to read and index the data inside track.php.

As Google's documentation says:

While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.

There's not a lot you can do about this. For your own pages, you can use the x-robots-tag or noindex meta tag as described in that documentation. That will prevent Googlebot from indexing the URL if it finds a link in your pages. But if some page that you don't control links to that track.php file, then Google is quite likely to index it.

like image 126
Jim Mischel Avatar answered Nov 21 '25 04:11

Jim Mischel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!