Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apache: escaped umlauts in query string (URL) lead to 403

i have a problem i never encountered before, and i think it has something to do with the apache configuration, which i'm not very well versed in.

first, there is a php script with a search form. the form is transmitted via POST.

then there's the result list of search hits. here the original search query is passed as part of the url, e.g.: search.php?id=1234&query=foo. this also works - as long as there are no umlauts (äöüÄÖÜß...) chars transmitted.

as soon as i include umlauts in the search query, the first part that transmits the query string as POST works, but passing it (urlencoded) in the URL leads to a 403.

so:

  • search.php?id=1234&query=bar works
  • search.php?id=1234&query=b%E4r leads to 403 (%E4 = "ä" utf-8 urlencoded)
  • search.php?id=1234&query=b%C3%A4r leads to 403 (%C3%A4 = "ä" utf-8 urlencoded)
  • submitting umlauts via POST works

i converted the app from iso-8859-1 to utf-8, but that made no difference.

i also tested it on my local machine, here it works flawlessly - as expected.

remote server setup (where it doesn't work):

Apache/2.2.12 (Ubuntu),
PHP Version 5.2.10-2ubuntu6.7, Suhosin Patch 0.9.7, via CGI/FastCGI

local setup (here the same works):

Apache/2.2.8 (Win32) PHP/5.3.5
PHP Version 5.3.5 via mod_php

does anybody have an idea why the remote apache/php-cgi doesn't accept properly urlencoded umlauts in the url?

additional info: i also tried to create a static file with an umlaut in it's name, and both /t%C3%A4st.php and /täst.php get served without problem. täst.php?foo=täst fails.

note: ?foo=%28, where %28 is "(", works also.

like image 337
stefs Avatar asked Feb 01 '11 12:02

stefs


1 Answers

Apache doesn't escapes that, the browser does.

You need to use urlencode and urldecode to avoid issues with that kind of characters.

Some browsers, like old Netscape, just sends the url as written, with 8-bit characters in it. Others, notably MSIE, encodes the url as UTF-8 before sending it to the web-server, so a 8-bit character arrives as two characters, of which the first has the 8th bit set. There is not indication whatsoever, in request headers or elsewhere, that the url is encoded in UTF-8.

like image 125
Sein Oxygen Avatar answered Oct 13 '22 16:10

Sein Oxygen