Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine whether content returned by .NET HttpClient is Gzipped?

I have a requirement to download some content from a remote URL and then also determine whether the content was compressed (Gzip or Deflate).

My issue is that when you allow the HttpClient to perform automatic decompression then it doesn't return any value in the response.Content.Headers.ContentEncoding property. If you don't enable automatic decompression then it does return the correct value for ContentEncoding but then you are left with a Gzipped document that hasn't been decompressed, which is not useful.

Take the following code:

var handler = new HttpClientHandler()
{
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};

using (var client = new HttpClient(handler))
{
    client.DefaultRequestHeaders.Add("accept-encoding", "gzip, deflate");
    client.DefaultRequestHeaders.Add("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)");

    using (var message = new HttpRequestMessage(HttpMethod.Get, new Uri("https://www.twitter.com")))
    {
        using (var response = await client.SendAsync(message))
        {
            if (response.IsSuccessStatusCode)
            {
                string encoding = String.Join(",", response.Content.Headers.ContentEncoding);

                string content = await response.Content.ReadAsStringAsync();
            }
        }
    }
}

When the HttpClientHandler is set to use AutomaticDecompression then the value in content is successfully requested as GZip and then decompressed correctly. But the ContentEncoding value in the response headers collection is empty.

If I remove the line:

AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate

then I do get the correct ContentEncoding value ("gzip") returned, but then the document is returned in it's raw compressed format, which is no good.

So is there any way to get content that may sometimes (but not always) be GZipped and automatically decompress it when it is, but then know afterward whether it was originally sent as Gzip?

like image 590
Dan Diplo Avatar asked Feb 02 '17 14:02

Dan Diplo


People also ask

How do I know if a response is Gzipped?

Double click on the file and select headers. Under 'Response headers' you are looking for the 'Connection-Encoding' field, it will say gzip if it is enabled.

What is HttpClientHandler C#?

The HttpClient class uses a message handler to process the requests on the client side. The default handler provided by the dot net framework is HttpClientHandler. This HTTP Client Message Handler sends the request over the network and also gets the response from the server.


1 Answers

Not a full answer, but I peeked through the source code of HttpClient and that led me to the code of the underlying HttpResponse. In there, you find this nugget:

  if ((decompressionMethod & DecompressionMethods.GZip) != DecompressionMethods.None && str.IndexOf("gzip", StringComparison.CurrentCulture) != -1)
  {
    this.m_ConnectStream = (Stream) new GZipWrapperStream(this.m_ConnectStream, CompressionMode.Decompress);
    this.m_ContentLength = -1L;
    this.m_HttpResponseHeaders["Content-Encoding"] = (string) null;
  }

As you can see, on the last line, they're removing that header altogether. I'm not entirely sure why that's what they decided to do, but it is what it is.

I guess your options are to either Unzip it yourself, or to make two requests (both of which aren't great options).

like image 155
BFree Avatar answered Sep 22 '22 07:09

BFree