Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP UTF-encoded URL-string

When I type in Firefox (in the address line) URL like http://www.example.com/?query=Траливали, it is automatically encoded to http://www.example.com/?query=%D2%F0%E0%EB%E8%E2%E0%EB%E8.

But URL like http://www.example.com/#ajax_call?query=Траливали is not converted.

Other browsers such as IE8 do not convert query at all.

The question is: how to detect (in PHP) if query is encoded? How to decode it?

I've tried:

  1. $str = iconv('cp1251', 'utf-8', urldecode($str) );

  2. $str = utf8_decode(urldecode($str));

  3. $str = (urldecode($str));

  4. many functions from http://php.net/manual/en/function.urldecode.php Nothing works.

Test:

$str = $_GET['str'];

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urldecode('%D2%F0%E0%EB%E8%E2%E0%EB%E8'));

d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == $str);

d('Траливали' == $str);

d(urldecode($str));

d(utf8_decode(urldecode($str)));

!!! d('%D2%F0%E0%EB%E8%E2%E0%EB%E8' == urlencode($str)); !!!

Returns:

[false] [false] [false] ��������� ???? [true]

Some kind of a solution: http://www.example.com/Траливали/ - send a query as a url part and parse with mod_rewrite.

like image 276
topright gamedev Avatar asked Jul 30 '10 02:07

topright gamedev


1 Answers

It is not converted as having the query part of the URL after the fragment is not valid.

RFC 3986 defines a URI as composed of the following parts:

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment

The order cannot be changed. Therefore,

URL1: http://www.example.com/?query=Траливали#ajax_call

will be handled properly while

URL2: http://www.example.com/#ajax_call?query=Траливали

will not. If we look at URL2, IE actually handles the URL properly by detecting the fragment as #ajax_call?query=Траливали without a query. Fragment is always last and are never sent to the server.

IE will properly encode the query component of URL1 as it will detect it as a query.

As for decoding in PHP, %D2 and similar is automatically decoded in the $_GET['query'] variable. The reason why the $_GET variable was not properly populated was because in URL2, there is no query according to the standard.

Also, one last thing... when doing 'Траливали' == $_GET['query'], this will only be true if your PHP script itself is encoded in UTF-8. Your text editor should be able to tell you the encoding of your file.

like image 98
Andrew Moore Avatar answered Sep 29 '22 00:09

Andrew Moore