i'm using robots.txt to exclude some pages from spiders.
User-agent: *
Disallow: /track.php
When i search something refeered to this page, google says: "A description for this result is not available because of this site's robots.txt – learn more."
It means that the robots.txt is working.. but why the link to the page is still found by the spider? I'd like to have no link to the 'track.php' page... how i should setup the robots.txt? (or something like .htaccess and so on..?)
Here's what happened:
So Google knows that source.html links to track.php, but it doesn't know what track.php contains. You didn't tell Google not to index track.php; you told Googlebot not to read and index the data inside track.php.
As Google's documentation says:
While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.
There's not a lot you can do about this. For your own pages, you can use the x-robots-tag or noindex meta tag as described in that documentation. That will prevent Googlebot from indexing the URL if it finds a link in your pages. But if some page that you don't control links to that track.php file, then Google is quite likely to index it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With