I'm trying to figure out some inconsistencies in how HttpClient
handles URLs.
I have the following test code:
public async Task TestHttpClient()
{
var baseUrl = "https://api.twitter.com/1.1/search/tweets.json";
//var query = "(cafe OR boulangerie)";
var query = "(café OR boulangerie)";
var url = baseUrl + $"?q={Uri.EscapeDataString(query)}";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(url);
await response.Content.ReadAsStringAsync();
}
The code won't actually work, since we need authentication and other stuff for Twitter searches. But it demonstrates my problem.
The variable url
will have the following value:
https://api.twitter.com/1.1/search/tweets.json?q=%28caf%C3%A9%20OR%20boulangerie%29
However, looking at the request in Fiddler, I can see that what is actually sent is: https://api.twitter.com/1.1/search/tweets.json?q=(caf%C3%A9%20OR%20boulangerie)
So all of a sudden, the parentheses are no longer encoded. This matters in my case, because I use the encoded query string to calculate a signature that I use to authenticate against twitter. So my signature will have percent encoded parentheses and the request won't, so Twitter throws an error and tells me the authentication fails.
What is interesting is that if I send the query with a regular e
instead of é
then the parentheses are encoded in the request! Like this: https://api.twitter.com/1.1/search/tweets.json?q=%28cafe%20OR%20boulangerie%29
I suppose this is some kind of bug with HttpClient
? Can I work around this somehow?
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
So you can test if the string contains a colon, if not, urldecode it, and if that string contains a colon, the original string was url encoded, if not, check if the strings are different and if so, urldecode again and if not, it is not a valid URI. You can make this loop simpler if you know what schemes you can expect.
public class URLDecoder extends Object. Utility class for HTML form decoding. This class contains static methods for decoding a String from the application/x-www-form-urlencoded MIME format. The conversion process is the reverse of that used by the URLEncoder class.
So this turned out to be a difference in how Uri
encodes and decodes urls with and without unicode chars in them: https://github.com/dotnet/corefx/issues/15865.
The solution for me was to parse the contents of Uri.AbsoluteUri
(which encodes the url in the same, inconsistent, way) and use that when calculating a signature for the authentication. Instead of using Uri.EscapeDataString
as I was dpoing previously.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With