I'm working on the Google Cloud Storage .NET client library. There are three features (between .NET, my client library, and the Storage service) that are combining in an unpleasant way:
When downloading files (objects in Google Cloud Storage terminology), the server includes a hash of the stored data. My client code then validates that hash against the data it's downloaded.
A separate feature of Google Cloud Storage is that the user can set the Content-Encoding of the object, and that's included as a header when downloading, when the request contains a matching Accept-Encoding. (For the moment, let's ignore the behavior when the request doesn't include that...)
HttpClientHandler
can decompress gzip (or deflate) content
automatically and transparently.
When all three of these are combined, we get into trouble. Here's a short but complete program demonstrating that, but without using my client library (and hitting a publicly accessible file):
using System;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading.Tasks;
class Program
{
static async Task Main()
{
string url = "https://www.googleapis.com/download/storage/v1/b/"
+ "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.GZip
};
var client = new HttpClient(handler);
var response = await client.GetAsync(url);
byte[] content = await response.Content.ReadAsByteArrayAsync();
string text = Encoding.UTF8.GetString(content);
Console.WriteLine($"Content: {text}");
var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");
using (var md5 = MD5.Create())
{
var md5Hash = md5.ComputeHash(content);
var md5HashBase64 = Convert.ToBase64String(md5Hash);
Console.WriteLine($"MD5 of content: {md5HashBase64}");
}
}
}
.NET Core project file:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp2.0</TargetFramework>
<LangVersion>7.1</LangVersion>
</PropertyGroup>
</Project>
Output:
Content: hello world
Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA==
MD5 of content: XrY7u+Ae7tCTyyK7j1rNww==
As you can see, the MD5 of the content isn't the same as the MD5
part of the X-Goog-Hash
header. (In my client library I'm using the crc32c
hash, but that shows the same behavior.)
This isn't a bug in HttpClientHandler
- it's expected, but a pain
when I want to validate the hash. Basically, I need to at the
content before and after decompression. And I can't find any way
of doing that.
To clarify my requirements somewhat, I know how to prevent the decompression in HttpClient
and instead decompress afterwards when reading from the stream - but I need to be able to do this without changing any the code that uses the resulting HttpResponseMessage
from the HttpClient
. (There's a lot of code that deals with responses, and I want to only make the change in one central place.)
I have a plan, which I've prototyped and which works as far as I've found so far, but is a bit ugly. It involves creating a three-layer handler:
HttpClientHandler
with automatic decompression disabled.Stream
subclass
which delegates to the original content stream, but hashes the data as it's read.DecompressionHandler
code.While this works, it has disadvantages of:
If Microsoft made DecompressionHandler
public, that would help a
lot - but that's likely to be in a longer timeframe than I need.
What I'm looking for is an alternative approach if possible -
something I've missed that lets me get at the content before
decompression. I don't want to reinvent HttpClient
- the response
is often chunked for example, and I don't want to have to get into
that side of things. It's a pretty specific interception point that
I'm looking for.
HTTP Request Compression. When Integration Server is acting as an HTTP client and if user has a large set of data to compress, then user can use the pub. compress:compressData service to compress the data. While executing the service, user can define any supported compression scheme to compress the data.
It means the client can accept a response which has been compressed using the DEFLATE algorithm.
Looking at what @Michael did gave me the hint I was missing. After getting the compressed content you can use CryptoStream
, and GZipStream
, and StreamReader
to read the response without loading it into memory more than needed. CryptoStream
will hash the compressed content as it is decompressed and read. Replace the StreamReader
with a FileStream
and you can write the data to a file with minimal memory usage :)
using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading.Tasks;
class Program
{
static async Task Main()
{
string url = "https://www.googleapis.com/download/storage/v1/b/"
+ "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.None
};
var client = new HttpClient(handler);
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");
var response = await client.GetAsync(url);
var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");
string text = null;
using (var md5 = MD5.Create())
{
using (var cryptoStream = new CryptoStream(await response.Content.ReadAsStreamAsync(), md5, CryptoStreamMode.Read))
{
using (var gzipStream = new GZipStream(cryptoStream, CompressionMode.Decompress))
{
using (var streamReader = new StreamReader(gzipStream, Encoding.UTF8))
{
text = streamReader.ReadToEnd();
}
}
Console.WriteLine($"Content: {text}");
var md5HashBase64 = Convert.ToBase64String(md5.Hash);
Console.WriteLine($"MD5 of content: {md5HashBase64}");
}
}
}
}
Output:
Hash header: crc32c=T1s5RQ==,md5=xhF4M6pNFRDQnvaRRNVnkA==
Content: hello world
MD5 of content: xhF4M6pNFRDQnvaRRNVnkA==
V2 of Answer
After reading Jon's response and an updated answer I have the following version. Pretty much the same idea, but I moved the streaming into a special HttpContent
that I inject. Not exactly pretty but the idea is there.
using System;
using System.IO;
using System.IO.Compression;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Security.Cryptography;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
class Program
{
static async Task Main()
{
string url = "https://www.googleapis.com/download/storage/v1/b/"
+ "storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.None
};
var client = new HttpClient(new Intercepter(handler));
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");
var response = await client.GetAsync(url);
var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");
HttpContent content1 = response.Content;
byte[] content = await content1.ReadAsByteArrayAsync();
string text = Encoding.UTF8.GetString(content);
Console.WriteLine($"Content: {text}");
var md5Hash = ((HashingContent)content1).Hash;
var md5HashBase64 = Convert.ToBase64String(md5Hash);
Console.WriteLine($"MD5 of content: {md5HashBase64}");
}
public class Intercepter : DelegatingHandler
{
public Intercepter(HttpMessageHandler innerHandler) : base(innerHandler)
{
}
protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
var response = await base.SendAsync(request, cancellationToken);
response.Content = new HashingContent(await response.Content.ReadAsStreamAsync());
return response;
}
}
public sealed class HashingContent : HttpContent
{
private readonly StreamContent streamContent;
private readonly MD5 mD5;
private readonly CryptoStream cryptoStream;
private readonly GZipStream gZipStream;
public HashingContent(Stream content)
{
mD5 = MD5.Create();
cryptoStream = new CryptoStream(content, mD5, CryptoStreamMode.Read);
gZipStream = new GZipStream(cryptoStream, CompressionMode.Decompress);
streamContent = new StreamContent(gZipStream);
}
protected override Task SerializeToStreamAsync(Stream stream, TransportContext context) => streamContent.CopyToAsync(stream, context);
protected override bool TryComputeLength(out long length)
{
length = 0;
return false;
}
protected override Task<Stream> CreateContentReadStreamAsync() => streamContent.ReadAsStreamAsync();
protected override void Dispose(bool disposing)
{
try
{
if (disposing)
{
streamContent.Dispose();
gZipStream.Dispose();
cryptoStream.Dispose();
mD5.Dispose();
}
}
finally
{
base.Dispose(disposing);
}
}
public byte[] Hash => mD5.Hash;
}
}
I managed to get the headerhash correct by:
SendAsync
base.SendAsync
this issue is, as you said "before decompression" is not really respected here
The idea is to get this if
working as you would like
https://github.com/dotnet/corefx/blob/master/src/System.Net.Http.WinHttpHandler/src/System/Net/Http/WinHttpResponseParser.cs#L80-L91
it matches
class Program
{
const string url = "https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media";
static async Task Main()
{
//await HashResponseContent(CreateHandler(DecompressionMethods.None));
//await HashResponseContent(CreateHandler(DecompressionMethods.GZip));
await HashResponseContent(new MyHandler());
Console.ReadLine();
}
private static HttpClientHandler CreateHandler(DecompressionMethods decompressionMethods)
{
return new HttpClientHandler { AutomaticDecompression = decompressionMethods };
}
public static async Task HashResponseContent(HttpClientHandler handler)
{
//Console.WriteLine($"Using AutomaticDecompression : '{handler.AutomaticDecompression}'");
//Console.WriteLine($"Using SupportsAutomaticDecompression : '{handler.SupportsAutomaticDecompression}'");
//Console.WriteLine($"Using Properties : '{string.Join('\n', handler.Properties.Keys.ToArray())}'");
var client = new HttpClient(handler);
var response = await client.GetAsync(url);
byte[] content = await response.Content.ReadAsByteArrayAsync();
string text = Encoding.UTF8.GetString(content);
Console.WriteLine($"Content: {text}");
var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Console.WriteLine($"Hash header: {hashHeader}");
byteArrayToMd5(content);
Console.WriteLine($"=====================================================================");
}
public static string byteArrayToMd5(byte[] content)
{
using (var md5 = MD5.Create())
{
var md5Hash = md5.ComputeHash(content);
return Convert.ToBase64String(md5Hash);
}
}
public static byte[] Compress(byte[] contentToGzip)
{
using (MemoryStream resultStream = new MemoryStream())
{
using (MemoryStream contentStreamToGzip = new MemoryStream(contentToGzip))
{
using (GZipStream compressionStream = new GZipStream(resultStream, CompressionMode.Compress))
{
contentStreamToGzip.CopyTo(compressionStream);
}
}
return resultStream.ToArray();
}
}
}
public class MyHandler : HttpClientHandler
{
protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
var response = await base.SendAsync(request, cancellationToken);
var responseContent = await response.Content.ReadAsByteArrayAsync().ConfigureAwait(false);
Program.byteArrayToMd5(responseContent);
var compressedResponse = Program.Compress(responseContent);
var compressedResponseMd5 = Program.byteArrayToMd5(compressedResponse);
Console.WriteLine($"recompressed response to md5 : {compressedResponseMd5}");
return response;
}
}
What about disabling automatic decompression, manually adding the Accept-Encoding
header(s) and then decompressing after hash verification?
private static async Task Test2()
{
var url = @"https://www.googleapis.com/download/storage/v1/b/storage-library-test-bucket/o/gzipped-text.txt?alt=media";
var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.None
};
var client = new HttpClient(handler);
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip");
var response = await client.GetAsync(url);
var raw = await response.Content.ReadAsByteArrayAsync();
var hashHeader = response.Headers.GetValues("X-Goog-Hash").FirstOrDefault();
Debug.WriteLine($"Hash header: {hashHeader}");
bool match = false;
using (var md5 = MD5.Create())
{
var md5Hash = md5.ComputeHash(raw);
var md5HashBase64 = Convert.ToBase64String(md5Hash);
match = hashHeader.EndsWith(md5HashBase64);
Debug.WriteLine($"MD5 of content: {md5HashBase64}");
}
if (match)
{
var memInput = new MemoryStream(raw);
var gz = new GZipStream(memInput, CompressionMode.Decompress);
var memOutput = new MemoryStream();
gz.CopyTo(memOutput);
var text = Encoding.UTF8.GetString(memOutput.ToArray());
Console.WriteLine($"Content: {text}");
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With