Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird characters in URL

In my webserver when user requests URLs with weird characters, I remove these characters. And system logs these cases. When I check sanitized cases I found these. I'm curious that what would be the objective of these URLs ?

I check the IPs and these are real people and uses website as a normal person. But 1 time in their 20 URL requets of these people, URL has these weird characters at last.

http://example.com/@%EF%BF%BD%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0,
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/p%EF%BF%BD%1D%01?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDC%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDR%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD`%EF%BF%BD%EF%BF%BD%7F, agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36
http://example.com/%EF%BF%BDe%EF%BF%BDv8%01%EF%BF%BD?o=3&g=P%01%EF%BF%BD&s=&z=%EF%BF%BD%EF%BF%BD%15%01%EF%BF%BD%EF%BF%BD, agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36

http://en.wikipedia.org/wiki/Specials_(Unicode_block)

like image 916
trante Avatar asked Aug 09 '14 20:08

trante


1 Answers

They are essentially malformed URLs. They can be generated from a specific malware that is trying to exploit web site vulnerabilities, from malfunctioning browser plugin or extension, or from a bug in a JS file (i.e. tracking with Google Analytics) in combination with a specific browser version/operating system. In any case, you can't actually control what requests will come from a client and there's nothing you can do to stop that so, if your generated HTML/JS code is correct, you have done your work.

If you like to correct those URLs for any reason, you can enable URL rewriting and set a rule with a regular expression filter to transform those URLs to valid URLs. Anyway, I don't suggest do that: the web server should respond with a error 404 page not found message, because that is the standard (it's a client error, after all), and this is in my opinion a faster and safer method than applying URL rewriting. (rewriting procedure may contains bugs, so someone can try to exploit that, etc, etc)

For sake of curiosity, you can easily decode those URLs with an online URL decoder of your choice (i.e. this), but essentially you will discover what you already know: there are a lot of UTF-8 replacement characters in those URLs.

In fact, %EF%BF%BD is the url-encoded version of the hex representation of the 3 bytes (EF BF BD) of the UTF-8 replacement character. You can see that character also as or EF BF BD or FFFD or ï ¿ ½, and so on, depending of the representation method you choose.

Also, you can check by your own how the client handles that character. Go here:

http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%BD&mode=char

press the GO button and, using your browser developer tools, check what really happens: the browser is actually encoding the unknown character with %EF%BF%BD before sending it to the web server.

like image 151
Giuseppe Bertone Avatar answered Oct 04 '22 06:10

Giuseppe Bertone