Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google sees something that it shouldn't see. Why?

For some mysterious reason, Google has indexed both these adresses, that lead to the same page:

/something/some-text-1055.html

and

/index.php?pg=something&id=1055

(short notice - the site has had friendly urls since its launch, I have no idea how google found the "index.php?" url - there are "unfriendly" urls only in the content management system, which is password-restricted)

What can I do to solve the situation? (I have around 1000 pages that are double-indexed.) Somebody told me to use "disallow: index.php?" in the robots.txt file. Right or wrong? Any other suggestions?

like image 452
Ionuț G. Stan Avatar asked Mar 13 '09 20:03

Ionuț G. Stan


1 Answers

You'd be surprised as how pervasive and quick the google bots are at indexing site content. That, combined with lots of CMS systems creating unintended pages/links making it likely that at some point those links were exposed is the most likely culprit. It's also possible your administration area isn't as secure as you think, the google bot got through that way.

The well-behaved, and google recommended, things to do here are

  1. If possible, create 301 redirects from you query string style URLs to your canonical style URLs. That's you saying "hey there, web bot/browser, the content that used to be at this URL is now at this other URL"

  2. Block the query string content in your robots.txt. That's like asking the spiders or other automated programs "Hey, please don't look at this stuff. These aren't the URLs you're looking for"

  3. Google apparently allows you to specify a canonical URL now via a <link /> tag in the top of your page. Consider adding these in.

As to whether doing the well behaved things is the the "right" thing to do re: Google rankings ... who knows. Only "Google" knows how their algorithms work now, and will work in the future, and by Google, I mean a bunch of engineers and executives with conflicting goals on how search should work.

like image 79
Alan Storm Avatar answered Oct 13 '22 13:10

Alan Storm