Googlebot has been occasionally indexing one of our sites with a bad query string parameter. I am not sure how it is getting this query string parameter (there don't appear to be any sites linking to us with bad links, and nothing in our site is inserting the bad value). The bad parameter causes the site to throw a 500 error, as we expect.
I was under the impression that Google would not index pages that return a 500 error, but it turns out that it is. So now I have two questions:
1) Why would Googlebot be inserting random bad query string values? (I don't really care about the answer to this question, but if we could do something to avoid that, it would solve our problem.)
2) Why would Google index a page that returns a 500 error?
Here is one of the erroneous links that the Googlebot created and that Google has indexed:
http://www.pbs.org/teacherline/catalog/browse/?sa=4&gb=baqhuxts&gb=20&gb=21&num=20&page=2&js=0&sa=1
The bad parameter is gb=baqhuxts. The parameter 'gb' is expected to be an integer. If you remove that parameter from the query string you should get a nice catalog page showing.
Regarding nofollow and robots.txt solutions: [ REDACTED ]
I realize now that I am a moron and put a meta tag telling search robots to index the page. That was a dumb thing to do. I'm removing those. W-(
If you search on Google for 'baqhuxts' you will find that it has indexed 10 pages with this bad parameter. But each of these pages returns a 500 error. Does anyone have insight about why Google believes these are valid pages to index?
It can take time for Google to index your page; allow at least a week after submitting a sitemap or a submit to index request before assuming a problem. If your page or site change is recent, check back in a week to see if it is still missing.
Crawling can take anywhere from a few days to a few weeks. Be patient and monitor progress using either the Index Status report or the URL Inspection tool. Requesting a crawl does not guarantee that inclusion in search results will happen instantly or even at all.
It's probably because you are telling Google to index it by having this in your meta-tags:
<meta name="robots" content="index,follow">
Try removing that! :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With