Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

practices in handling bad robots requests url's containing ampersand like "&" instead of "&"

Tags:

html

url

& is a reserved character in html therefore everywhere I have url's pointing to some path with querystring I put & instead of & so that I get valid HTML.

There are a many different crawlers that goes over the website and access this url's but they don't use html decode methods to get the correct url values so they make requests to my website with:

mywebsite.com/?p1=v1&p2=v2

instead of

mywebsite.com/?p1=v1&p2=v2

Right now I am responding with the error page as the robots that makes this requests are of no interest to me.

But my question is, what are the best practice to handle this kind of requests?

Do you know if there is of any use to support handling this kind of requests? ( for example are there any popular crawlers or browsers that doesn't properly converts this url's ?)

like image 956
Dorin Avatar asked Jun 18 '12 14:06

Dorin


1 Answers

I think you can expect that any major crawler is able to handle valid escaped URLs. So I won't worry about the rest.

If you really like to then you may want to add rewrite rules to your Apache or whatever you use. But this may lead to other problems when an URL really contains the charsequence & and got replaced with & by your rewrite rule for error.

In my opinion it is better to leave this untouched. It is not your fault and when you do not really care about these crawler - so what? :)

like image 107
Fabian Barney Avatar answered Nov 15 '22 18:11

Fabian Barney