Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HttpClient decodes encoded Url?

I'm trying to figure out some inconsistencies in how HttpClient handles URLs.

I have the following test code:

public async Task TestHttpClient()
{
    var baseUrl = "https://api.twitter.com/1.1/search/tweets.json";
    //var query = "(cafe OR boulangerie)";
    var query = "(café OR boulangerie)";

    var url = baseUrl + $"?q={Uri.EscapeDataString(query)}";

    var httpClient = new HttpClient();
    var response = await httpClient.GetAsync(url);

    await response.Content.ReadAsStringAsync();
}

The code won't actually work, since we need authentication and other stuff for Twitter searches. But it demonstrates my problem.

The variable url will have the following value: https://api.twitter.com/1.1/search/tweets.json?q=%28caf%C3%A9%20OR%20boulangerie%29

However, looking at the request in Fiddler, I can see that what is actually sent is: https://api.twitter.com/1.1/search/tweets.json?q=(caf%C3%A9%20OR%20boulangerie)

So all of a sudden, the parentheses are no longer encoded. This matters in my case, because I use the encoded query string to calculate a signature that I use to authenticate against twitter. So my signature will have percent encoded parentheses and the request won't, so Twitter throws an error and tells me the authentication fails.

What is interesting is that if I send the query with a regular e instead of é then the parentheses are encoded in the request! Like this: https://api.twitter.com/1.1/search/tweets.json?q=%28cafe%20OR%20boulangerie%29

I suppose this is some kind of bug with HttpClient? Can I work around this somehow?

like image 967
Joel Avatar asked Feb 06 '17 16:02

Joel


People also ask

How do you decode a space in a URL?

Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.

How can I tell if a URL is encoded?

So you can test if the string contains a colon, if not, urldecode it, and if that string contains a colon, the original string was url encoded, if not, check if the strings are different and if so, urldecode again and if not, it is not a valid URI. You can make this loop simpler if you know what schemes you can expect.

What is URL decoder in Java?

public class URLDecoder extends Object. Utility class for HTML form decoding. This class contains static methods for decoding a String from the application/x-www-form-urlencoded MIME format. The conversion process is the reverse of that used by the URLEncoder class.


1 Answers

So this turned out to be a difference in how Uri encodes and decodes urls with and without unicode chars in them: https://github.com/dotnet/corefx/issues/15865.

The solution for me was to parse the contents of Uri.AbsoluteUri (which encodes the url in the same, inconsistent, way) and use that when calculating a signature for the authentication. Instead of using Uri.EscapeDataString as I was dpoing previously.

like image 133
Joel Avatar answered Oct 11 '22 19:10

Joel