I have been tracking down a bug on a Url Rewriting application. The bug showed up as an encoding problem on some diacritic characters in the querystring.
Basically, the problem was that a request which was basically /search.aspx?search=heřmánek was getting rewritten with a querystring of "search=he%c5%99m%c3%a1nek"
The correct value (using some different, working code) was a rewrite of the querystring as "search=he%u0159m%u00e1nek"
Note the difference between the two strings. However, if you post both you'll see that the Url Encoding reproduces the same string. It's not until you use the context.Rewrite function that the encoding breaks. The broken string returns 'heÅmánek' (using Request.QueryString["Search"] and the working string returns 'heřmánek'. This change happens after the call to the rewrite function.
I traced this down to one set of code using Request.QueryString (working) and the other using Request.Url.Query (request.Url returns a Uri instance).
While I have worked out the bug there is a hole in my understanding here, so if anyone knows the difference, I'm ready for the lesson.
The value of Request. QueryString(parameter) is an array of all of the values of parameter that occur in QUERY_STRING. You can determine the number of values of a parameter by calling Request. QueryString(parameter).
Query parameters are a defined set of parameters attached to the end of a url. They are extensions of the URL that are used to help define specific content or actions based on the data being passed.
Form, the Web server parses the HTTP request body and returns the specified data. If your application requires unparsed data from the form, you can access it by calling Request. Form without any parameters. The QueryString collection retrieves the values of the variables in the HTTP query string.
Parameters are key-value pairs that can appear inside URL path, and start with a semicolon character ( ; ). Query string appears after the path (if any) and starts with a question mark character ( ? ). Both parameters and query string contain key-value pairs.
Your question really sparked my interest, so I've done some reading for the past hour or so. I'm not absolutely positive I've found the answer, but I'll throw it out there to see what you think.
From what I've read so far, Request.QueryString is actually "a parsed version of the QUERY_STRING variable in the ServerVariables collection" [reference] , where as Request.Url is (as you stated) the raw URL encapsulated in the Uri object. According to this article, the Uri class' constructor "...parses the [url string], puts it in canonical format, and makes any required escape encodings."
Therefore, it appears that Request.QueryString uses a different function to parse the "QUERY_STRING" variable from the ServerVariables constructor. This would explain why you see the difference between the two. Now, why different encoding methods are used by the custom parsing function and the Uri object's parsing function is entirely beyond me. Maybe somebody a bit more versed on the aspnet_isapi DLL could provide some answers with that question.
Anyway, hopefully my post makes sense. On a side note, I'd like to add another reference which also provided for some very thorough and interesting reading: http://download.microsoft.com/download/6/c/a/6ca715c5-2095-4eec-a56f-a5ee904a1387/Ch-12_HTTP_Request_Context.pdf
What you indicated as the "broken" encoded string is actually the correct encoding according to standards. The one that you indicated as "correct" encoding is using a non-standard extension to the specifications to allow a format of %uXXXX
(I believe it's supposed to indicate UTF-16 encoding).
In any case, the "broken" encoded string is ok. You can use the following code to test that:
Uri uri = new Uri("http://www.example.com/test.aspx?search=heřmánek");
Console.WriteLine(uri.Query);
Console.WriteLine(HttpUtility.UrlDecode(uri.Query));
Works fine. However... on a hunch, I tried UrlDecode with a Latin-1 codepage specified, instead of the default UTF-8:
Console.WriteLine(HttpUtility.UrlDecode(uri.Query,
Encoding.GetEncoding("iso-8859-1")));
... and I got the bad value you specified, 'heÅmánek'. In other words, it looks like the call to HttpContext.RewritePath()
somehow changes the urlencoding/decoding to use the Latin-1 codepage, rather than UTF-8, which is the default encoding used by the UrlEncode/Decode methods.
This looks like a bug if you ask me. You can look at the RewritePath()
code in reflector and see that it is definitely playing with the querystring - passing it around to all kinds of virtual path functions, and out to some unmanaged IIS code.
I wonder if somewhere along the way, the Uri at the core of the Request object gets manipulated with the wrong codepage? That would explain why Request.Querystring
(which is simply the raw values from the HTTP headers) would be correct, while the Uri using the wrong encoding for the diacriticals would be incorrect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With