Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GZipStream effectivness

Tags:

c#

gzipstream

I am trying to save big UInt16 array into a file. positionCnt is about 50000, stationCnt is about 2500. Saved directly, without GZipStream, the file is about 250MB which can be compressed by external zip program to 19MB. With the following code the file is 507MB. What do I do wrong?

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BinaryWriter fs = new BinaryWriter(cmp);
fs.Write((Int32)(positionCnt * stationCnt));
for (int p = 0; p < positionCnt; p++)
{
    for (int s = 0; s < stationCnt; s++)
    {
       fs.Write(BoundData[p, s]);
    }
}
fs.Close();
like image 848
danatel Avatar asked Sep 28 '11 19:09

danatel


2 Answers

Not sure what version of .NET you're running on. In earlier versions, it used a window size that was the same size as the buffer that you wrote from. So in your case it would try to compress each integer individually. I think they changed that in .NET 4.0, but haven't verified that.

In any case, what you want to do is create a buffered stream ahead of the GZipStream:

// Create file stream with 64 KB buffer FileStream fs = new FileStream(filename, FileMode.Create, FileAccess.Write, FileShare.None, 65536); GZipStream cmp = new GZipStream(fs, CompressionMode.Compress); ...

GZipStream cmp = new GZipStream(File.Open(cacheFileName, FileMode.Create), CompressionMode.Compress);
BufferedStream buffStrm = new BufferedStream(cmp, 65536);
BinaryWriter fs = new BinaryWriter(buffStrm);

This way, the GZipStream gets data in 64 Kbyte chunks, and can do a much better job of compressing.

Buffers larger than 64KB won't give you any better compression.

like image 194
Jim Mischel Avatar answered Nov 03 '22 20:11

Jim Mischel


For whatever reason, which is not apparent to me during a quick read of the GZip implementation in .Net, the performance is sensitive to the amount of data written at once. I benchmarked your code against a few styles of writing to the GZipStream and found the most efficient version wrote long strides to the disk.

The trade-off is memory in this case, as you need to convert the short[,] to byte[] based on the stride length you'd like:

using (var writer = new GZipStream(File.Create("compressed.gz"),
                                   CompressionMode.Compress))
{
    var bytes = new byte[data.GetLength(1) * 2];
    for (int ii = 0; ii < data.GetLength(0); ++ii)
    {
        Buffer.BlockCopy(data, bytes.Length * ii, bytes, 0, bytes.Length);
        writer.Write(bytes, 0, bytes.Length);
    }

    // Random data written to every other 4 shorts
    // 250,000,000 uncompressed.dat
    // 165,516,035 compressed.gz (1 row strides)
    // 411,033,852 compressed2.gz (your version)
}
like image 31
user7116 Avatar answered Nov 03 '22 20:11

user7116