Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to access the compressed data before decompression in HttpClient?

I'm working on the Google Cloud Storage .NET client library. There are three features (between .NET, my client library, and the Storage service) that are combining in an unpleasant way:

  • When downloading files (objects in Google Cloud Storage terminology), the server includes a hash of the stored data. My client code then validates that hash against the data it's downloaded.

  • A separate feature of Google Cloud Storage is that the user can set the Content-Encoding of the object, and that's included as a header when downloading, when the request contains a matching Accept-Encoding. (For the moment, let's ignore the behavior when the request doesn't include that...)

  • HttpClientHandler can decompress gzip (or deflate) content automatically and transparently.

When all three of these are combined, we get into trouble. Here's a short but complete program demonstrating that, but without using my client library (and hitting a publicly accessible file):

using System;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading.Tasks;

class Program
{
    static async Task Main()
    {
        string url = "https://www.googleapis.com/download/storage/v1/b/"
            + "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
        var handler = new HttpClientHandler
        {
            AutomaticDecompression = DecompressionMethods.GZip
        };
        var client = new HttpClient(handler);

        var response = await client.GetAsync(url);
        byte[] content = await response.Content.ReadAsByteArrayAsync();
        string text = Encoding.UTF8.GetString(content);
        Console.WriteLine($"Content: {text}");

        var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
        Console.WriteLine($"Hash header: {hashHeader}");

        using (var md5 = MD5.Create())
        {
            var md5Hash = md5.ComputeHash(content);
            var md5HashBase64 = Convert.ToBase64String(md5Hash);
            Console.WriteLine($"MD5 of content: {md5HashBase64}");
        }
    }
}

.NET Core project file:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.0</TargetFramework>
    <LangVersion>7.1</LangVersion>
  </PropertyGroup>
</Project>

Output:

Content: hello world
Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA==
MD5 of content: XrY7u+Ae7tCTyyK7j1rNww==

As you can see, the MD5 of the content isn't the same as the MD5 part of the X-Goog-Hash header. (In my client library I'm using the crc32c hash, but that shows the same behavior.)

This isn't a bug in HttpClientHandler - it's expected, but a pain when I want to validate the hash. Basically, I need to at the content before and after decompression. And I can't find any way of doing that.

To clarify my requirements somewhat, I know how to prevent the decompression in HttpClient and instead decompress afterwards when reading from the stream - but I need to be able to do this without changing any the code that uses the resulting HttpResponseMessage from the HttpClient. (There's a lot of code that deals with responses, and I want to only make the change in one central place.)

I have a plan, which I've prototyped and which works as far as I've found so far, but is a bit ugly. It involves creating a three-layer handler:

  • HttpClientHandler with automatic decompression disabled.
  • A new handler which replaces the content stream with a new Stream subclass which delegates to the original content stream, but hashes the data as it's read.
  • A decompression-only handler, based on the Microsoft DecompressionHandler code.

While this works, it has disadvantages of:

  • Open source licensing: checking exactly what I need to do in order to create a new file in my repo based on the MIT-licensed Microsoft code
  • Effectively forking the MS code, which means I should probably make a regular check to see if any bugs have been found in it
  • The Microsoft code uses internal members of the assembly, so it doesn't port as cleanly as it might.

If Microsoft made DecompressionHandler public, that would help a lot - but that's likely to be in a longer timeframe than I need.

What I'm looking for is an alternative approach if possible - something I've missed that lets me get at the content before decompression. I don't want to reinvent HttpClient - the response is often chunked for example, and I don't want to have to get into that side of things. It's a pretty specific interception point that I'm looking for.

like image 600
Jon Skeet Avatar asked Nov 16 '17 07:11

Jon Skeet


People also ask

How do I compress a HTTP request?

HTTP Request Compression. When Integration Server is acting as an HTTP client and if user has a large set of data to compress, then user can use the pub. compress:compressData service to compress the data. While executing the service, user can define any supported compression scheme to compress the data.

What is accept encoding deflate?

It means the client can accept a response which has been compressed using the DEFLATE algorithm.


3 Answers

Looking at what @Michael did gave me the hint I was missing. After getting the compressed content you can use CryptoStream, and GZipStream, and StreamReader to read the response without loading it into memory more than needed. CryptoStream will hash the compressed content as it is decompressed and read. Replace the StreamReader with a FileStream and you can write the data to a file with minimal memory usage :)

using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading.Tasks;

class Program
{
    static async Task Main()
    {
        string url = "https://www.googleapis.com/download/storage/v1/b/"
            + "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
        var handler = new HttpClientHandler
        {
            AutomaticDecompression = DecompressionMethods.None
        };
        var client = new HttpClient(handler);
        client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");

        var response = await client.GetAsync(url);
        var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
        Console.WriteLine($"Hash header: {hashHeader}");
        string text = null;
        using (var md5 = MD5.Create())
        {
            using (var cryptoStream = new CryptoStream(await response.Content.ReadAsStreamAsync(), md5, CryptoStreamMode.Read))
            {
                using (var gzipStream = new GZipStream(cryptoStream, CompressionMode.Decompress))
                {
                    using (var streamReader = new StreamReader(gzipStream, Encoding.UTF8))
                    {
                        text = streamReader.ReadToEnd();
                    }
                }
                Console.WriteLine($"Content: {text}");
                var md5HashBase64 = Convert.ToBase64String(md5.Hash);
                Console.WriteLine($"MD5 of content: {md5HashBase64}");
            }
        }
    }
}

Output:

Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA==
Content: hello world
MD5 of content: xhF4M6pNFRDQnvaRRNVnkA==

V2 of Answer

After reading Jon's response and an updated answer I have the following version. Pretty much the same idea, but I moved the streaming into a special HttpContent that I inject. Not exactly pretty but the idea is there.

using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    static async Task Main()
    {
        string url = "https://www.googleapis.com/download/storage/v1/b/"
            + "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
        var handler = new HttpClientHandler
        {
            AutomaticDecompression = DecompressionMethods.None
        };
        var client = new HttpClient(new Intercepter(handler));
        client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");

        var response = await client.GetAsync(url);
        var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
        Console.WriteLine($"Hash header: {hashHeader}");
        HttpContent content1 = response.Content;
        byte[] content = await content1.ReadAsByteArrayAsync();
        string text = Encoding.UTF8.GetString(content);
        Console.WriteLine($"Content: {text}");
        var md5Hash = ((HashingContent)content1).Hash;
        var md5HashBase64 = Convert.ToBase64String(md5Hash);
        Console.WriteLine($"MD5 of content: {md5HashBase64}");
    }

    public class Intercepter : DelegatingHandler
    {
        public Intercepter(HttpMessageHandler innerHandler) : base(innerHandler)
        {
        }

        protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
        {
            var response = await base.SendAsync(request, cancellationToken);
            response.Content = new HashingContent(await response.Content.ReadAsStreamAsync());
            return response;
        }
    }

    public sealed class HashingContent : HttpContent
    {
        private readonly StreamContent streamContent;
        private readonly MD5 mD5;
        private readonly CryptoStream cryptoStream;
        private readonly GZipStream gZipStream;

        public HashingContent(Stream content)
        {
            mD5 = MD5.Create();
            cryptoStream = new CryptoStream(content, mD5, CryptoStreamMode.Read);
            gZipStream = new GZipStream(cryptoStream, CompressionMode.Decompress);
            streamContent = new StreamContent(gZipStream);
        }

        protected override Task SerializeToStreamAsync(Stream stream, TransportContext context) => streamContent.CopyToAsync(stream, context);
        protected override bool TryComputeLength(out long length)
        {
            length = 0;
            return false;
        }

        protected override Task<Stream> CreateContentReadStreamAsync() => streamContent.ReadAsStreamAsync();

        protected override void Dispose(bool disposing)
        {
            try
            {
                if (disposing)
                {
                    streamContent.Dispose();
                    gZipStream.Dispose();
                    cryptoStream.Dispose();
                    mD5.Dispose();
                }
            }
            finally
            {
                base.Dispose(disposing);
            }
        }

        public byte[] Hash => mD5.Hash;
    }
}
like image 58
shmuelie Avatar answered Oct 10 '22 23:10

shmuelie


I managed to get the headerhash correct by:

  • creating a custom handler that inherits HttpClientHandler
  • overriding SendAsync
  • read as byte the response using base.SendAsync
  • Compress it using GZipStream
  • Hashing the Gzip Md5 to base64 (using your code)

this issue is, as you said "before decompression" is not really respected here

The idea is to get this if working as you would like https://github.com/dotnet/corefx/blob/master/src/System.Net.Http.WinHttpHandler/src/System/Net/Http/WinHttpResponseParser.cs#L80-L91

it matches

class Program
{
    const string url = "https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media";

    static async Task Main()
    {
        //await HashResponseContent(CreateHandler(DecompressionMethods.None));
        //await HashResponseContent(CreateHandler(DecompressionMethods.GZip));
        await HashResponseContent(new MyHandler());

        Console.ReadLine();
    }

    private static HttpClientHandler CreateHandler(DecompressionMethods decompressionMethods)
    {
        return new HttpClientHandler { AutomaticDecompression = decompressionMethods };
    }

    public static async Task HashResponseContent(HttpClientHandler handler)
    {
        //Console.WriteLine($"Using AutomaticDecompression : '{handler.AutomaticDecompression}'");
        //Console.WriteLine($"Using SupportsAutomaticDecompression : '{handler.SupportsAutomaticDecompression}'");
        //Console.WriteLine($"Using Properties : '{string.Join('\n', handler.Properties.Keys.ToArray())}'");

        var client = new HttpClient(handler);

        var response = await client.GetAsync(url);
        byte[] content = await response.Content.ReadAsByteArrayAsync();
        string text = Encoding.UTF8.GetString(content);
        Console.WriteLine($"Content: {text}");

        var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
        Console.WriteLine($"Hash header: {hashHeader}");
        byteArrayToMd5(content);

        Console.WriteLine($"=====================================================================");
    }

    public static string byteArrayToMd5(byte[] content)
    {
        using (var md5 = MD5.Create())
        {
            var md5Hash = md5.ComputeHash(content);
            return Convert.ToBase64String(md5Hash);
        }
    }

    public static byte[] Compress(byte[] contentToGzip)
    {
        using (MemoryStream resultStream = new MemoryStream())
        {
            using (MemoryStream contentStreamToGzip = new MemoryStream(contentToGzip))
            {
                using (GZipStream compressionStream = new GZipStream(resultStream, CompressionMode.Compress))
                {
                    contentStreamToGzip.CopyTo(compressionStream);
                }
            }

            return resultStream.ToArray();
        }
    }
}

public class MyHandler : HttpClientHandler
{
    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        var response = await base.SendAsync(request, cancellationToken);
        var responseContent = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);

        Program.byteArrayToMd5(responseContent);

        var compressedResponse = Program.Compress(responseContent);
        var compressedResponseMd5 = Program.byteArrayToMd5(compressedResponse);

        Console.WriteLine($"recompressed response to md5 : {compressedResponseMd5}");

        return response;
    }
}
like image 26
Alexandre Hgs Avatar answered Oct 10 '22 23:10

Alexandre Hgs


What about disabling automatic decompression, manually adding the Accept-Encoding header(s) and then decompressing after hash verification?

private static async Task Test2()
{
    var url = @"https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media";
    var handler = new HttpClientHandler
    {
        AutomaticDecompression = DecompressionMethods.None
    };
    var client = new HttpClient(handler);
    client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");

    var response = await client.GetAsync(url);
    var raw = await response.Content.ReadAsByteArrayAsync();

    var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
    Debug.WriteLine($"Hash header: {hashHeader}");

    bool match = false;
    using (var md5 = MD5.Create())
    {
        var md5Hash = md5.ComputeHash(raw);
        var md5HashBase64 = Convert.ToBase64String(md5Hash);
        match = hashHeader.EndsWith(md5HashBase64);
        Debug.WriteLine($"MD5 of content: {md5HashBase64}");
    }

    if (match)
    {
        var memInput = new MemoryStream(raw);
        var gz = new GZipStream(memInput, CompressionMode.Decompress);
        var memOutput = new MemoryStream();
        gz.CopyTo(memOutput);
        var text = Encoding.UTF8.GetString(memOutput.ToArray());
        Console.WriteLine($"Content: {text}");
    }
}
like image 26
Michael Avatar answered Oct 11 '22 00:10

Michael