Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using GZipStream to compress empty input results in an invalid gz file in C#

I am using the C# GZipStream class to compress some input data. The problem is when that input is empty. In that scenario, it ends up creating a 0 byte file. When I try to use 7zip to unzip the resulting .gz file, it gives an error saying the format is invalid. If I have a non-empty input, it works fine. Please tell me how I can create a valid .gz file that will uncompress into a 0 byte file?

var file = new FileStream("foo.txt.gz", FileMode.Create, FileAccess.ReadWrite);
var gzip = new GZipStream(file, CompressionMode.Compress);
var writer = new StreamWriter(gzip);

for (string line in input) {
    writer.Write(line);
}

writer.Close();
gzip.Close();
file.Close();

In the code above, if my 'input' array is empty, I end up writing a file called foo.txt.gz with 0 bytes, and 7zip says the file is invalid. But if I have a non-empty array, I get a valid file. Please tell me how I can modify my code to resolve the issue such that I get a valid .gz file even when the input is empty. Thanks!


EDIT: This may be a bug in .NET. If you notice the same issue and agree that it is a bug, please vote on: https://connect.microsoft.com/VisualStudio/feedback/details/888912/gzipstream-creates-invalid-gz-files-when-input-is-empty

like image 329
Gadzair Avatar asked Jun 03 '14 21:06

Gadzair


1 Answers

Unfortunately, this looks like a bug with the implementation of GZipStream in the .NET library.

According to the documentation, it should "appear as a valid, empty compressed file" according to MSDN (http://msdn.microsoft.com/en-ca/library/as1ff51s.aspx). But, when I tested your code, and some variations, I also get a completely empty file.

As a comparison, if I create an empty gzip file using Cygwin (echo -n | gzip -9 > empty.gz), I get a 20 byte file.

I suppose you could work around it by detecting when your input is empty and manually writing out an empty GZIP file. You could either refer to the GZIP file documentation (Wikipedia would be a good place to start) to create the file manually, or hard-code the 20 bytes required for an empty file in your program (with this solution, the internal timestamp and some other flags might be wrong, but that might not affect you in practice).

Alternatively, use a 3rd-party compression library like SharpZipLib (http://icsharpcode.github.io/SharpZipLib/) or DotNetZip (http://dotnetzip.codeplex.com/) that implements GZIP and use their implementation instead of GZipStream.

like image 187
Steven Avatar answered Nov 10 '22 04:11

Steven