I have many large gzip files (approximately 10MB - 200MB) that I downloaded from ftp to be decompressed.
So I tried to google and find some solution for gzip decompression.
static byte[] Decompress(byte[] gzip)
{
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
it works well for any files below 50mb but once i have input more than 50mb I got system out of memory exception. Last position and the length of memory before exception is 134217728. I don't think it has relation with my physical memory, I understand that I can't have object more than 2GB since I use 32-bit.
I also need to process the data after decompress the files. I'm not sure if memory stream is the best approach here but I don't really like write to file and then read the files again.
My questions
Memory allocation strategy for MemoryStream is not friendly for huge amounts of data.
Since contract for MemoryStream is to have contiguous array as underlying storage it has to reallocate array often enough for large stream (often as log2(size_of_stream)). Side effects of such reallocation are
As result handling large (100Mb+) stream through MemoryStream will likely case out of memory exception on x86 systems. In addition most common pattern to return data is to call GetArray as you do which additionally requires about the same amount of space as last array buffer used for MemoryStream.
Approaches to solve:
You can try a test like the following to get a feel for how much you can write to MemoryStream before getting a OutOfMemoryException :
const int bufferSize = 4096;
byte[] buffer = new byte[bufferSize];
int fileSize = 1000 * 1024 * 1024;
int total = 0;
try
{
using (MemoryStream memory = new MemoryStream())
{
while (total < fileSize)
{
memory.Write(buffer, 0, bufferSize);
total += bufferSize;
}
}
MessageBox.Show("No errors");
}
catch (OutOfMemoryException)
{
MessageBox.Show("OutOfMemory around size : " + (total / (1024m * 1024.0m)) + "MB" );
}
You may have to unzip to a temporary physical file first and re-read it in small chunks, and process as you go.
Side Point : interestingly, on a Windows XP PC, the above code gives : "OutOfMemory around size 256MB" when code targets .net 2.0, and "OutOfMemory around size 512MB" on .net 4.
Do you happen to be processing files in multiple threads? That would consume a large amount of your address space. OutOfMemory errors usually aren't related to physical memory, and so MemoryStream can run out far earlier than you'd expect. Check this discussion http://social.msdn.microsoft.com/Forums/en-AU/csharpgeneral/thread/1af59645-cdef-46a9-9eb1-616661babf90. If you switched to a 64-bit process, you'd probably be more than OK for the file sizes you're dealing with.
In your current situation though, you could work with memory mapped files to get around any address size limits. If you're using .NET 4.0, it provides a native wrapper for the Windows functions http://msdn.microsoft.com/en-us/library/dd267535.aspx.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With