I am trying to send a request to an url like this "http://mysite.dk/tværs?test=æ" from an asp.net application, and I am having trouble getting the querystring to encode correctly. Or maybe the querystring is encoded correctly, the service I am connecting to just doesn't understand it correctly.
I have tried to send the request with different browsers and logging how they encode the request with Wireshark, and I get these results:
Firefox: http://mysite.dk/tv%C3%A6rs?test=%E6 Ie8: http://mysite.dk/tv%C3%A6rs?test=\xe6 Curl: http://mysite.dk/tv\xe6rs?test=\xe6
Both Firefox, IE and Curl receive the correct results from the service. Note that they encode the danish special character 'æ' differently in the querystring.
When I send the request from my asp.net application using HttpWebRequest, the URL gets encoded this way:
http://mysite.dk/tv%C3%A6rs?test=%C3%A6
It encodes the querystring the same way as the path part of the url. The remote service does not understand this encoding, so I don't get a correct answer.
For the record, 'æ' (U+00E6) is %E6 in ISO-LATIN-1, and %C3%A6 in UTF-8.
I could change the remote service to accept the UTF-8 encoded querystring, but then the service would stop working in browsers and I am not really interested in that. Is there a way to specify to .NET that it shouldn't encode querystrings with UTF-8?
I am creating the webrequest like this:
var req = WebRequest.Create("http://mysite.dk/tværs?test=æ") as HttpWebRequest;
But the problem seems to originate from System.Uri which is apparently used inside WebRequest.Create:
var uri = new Uri("http://mysite.dk/tværs?test=æ");
// now uri.AbsolutePath == "http://mysite.dk/tv%C3%A6rs?test=%C3%A6"
It's important to always URL encode your query parameters before dispatching them to ensure that they do not become malformed during transmission.
Why do we need to encode? URLs can only have certain characters from the standard 128 character ASCII set. Reserved characters that do not belong to this set must be encoded. This means that we need to encode these characters when passing them into a URL.
The character is reserved for special use within the URL scheme. Some characters are reserved for a special meaning; their presence in a URL serve a specific purpose. Characters such as / , ? , : , @ , and & are all reserved and must be encoded.
It looks like you're applying UrlEncode over the entire URL - this isn't correct, paths and query strings are encoded differently as you've seen. What is doing the encoding of the URI, WebRequest?
You could manually build the various parts using a UriBuilder, or manually encode using UrlPathEncode for the path and UrlEncode for the query string names and values.
Edit:
If the problem lies in the path, rather than the query string you could try turning on IRI support, via web.config
<configuration>
<uri>
<iriParsing enabled="true" />
</uri>
</configuration>
That should then leave international characters alone in the path.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With