Why does C# memory stream reserve so much memory?

Tags:

Our software is decompressing certain byte data through a GZipStream, which reads data from a MemoryStream. These data are decompressed in blocks of 4KB and written into another MemoryStream.

We've realized that the memory the process allocates is much higher than the actual decompressed data.

Example: A compressed byte array with 2,425,536 bytes gets decompressed to 23,050,718 bytes. The memory profiler we use shows that the Method MemoryStream.set_Capacity(Int32 value) allocated 67,104,936 bytes. That's a factor of 2.9 between reserved and actually written memory.

Note: MemoryStream.set_Capacity is called from MemoryStream.EnsureCapacity which is itself called from MemoryStream.Write in our function.

Why does the MemoryStream reserve so much capacity, even though it only appends blocks of 4KB?

Here is the code snippet which decompresses data:

private byte[] Decompress(byte[] data) {     using (MemoryStream compressedStream = new MemoryStream(data))     using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))     using (MemoryStream resultStream = new MemoryStream())     {         byte[] buffer = new byte[4096];         int iCount = 0;          while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)         {             resultStream.Write(buffer, 0, iCount);         }         return resultStream.ToArray();     } }

Note: If relevant, this is the system configuration:

Windows XP 32bit,
.NET 3.5
Compiled with Visual Studio 2008

579

asked Jul 08 '14 15:07

Tim Meyer

1 Answers

Because this is the algorithm for how it expands its capacity.

public override void Write(byte[] buffer, int offset, int count) {      //... Removed Error checking for example      int i = _position + count;     // Check for overflow     if (i < 0)         throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));      if (i > _length) {         bool mustZero = _position > _length;         if (i > _capacity) {             bool allocatedNewArray = EnsureCapacity(i);             if (allocatedNewArray)                 mustZero = false;         }         if (mustZero)             Array.Clear(_buffer, _length, i - _length);         _length = i;     }      //...  }  private bool EnsureCapacity(int value) {     // Check for overflow     if (value < 0)         throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));     if (value > _capacity) {         int newCapacity = value;         if (newCapacity < 256)             newCapacity = 256;         if (newCapacity < _capacity * 2)             newCapacity = _capacity * 2;         Capacity = newCapacity;         return true;     }     return false; }  public virtual int Capacity  {     //...      set {          //...          // MemoryStream has this invariant: _origin > 0 => !expandable (see ctors)         if (_expandable && value != _capacity) {             if (value > 0) {                 byte[] newBuffer = new byte[value];                 if (_length > 0) Buffer.InternalBlockCopy(_buffer, 0, newBuffer, 0, _length);                 _buffer = newBuffer;             }             else {                 _buffer = null;             }             _capacity = value;         }     } }

So every time you hit the capacity limit it doubles the size of the capacity. The reason it does this is that Buffer.InternalBlockCopy operation is slow for large arrays so if it had to frequently resize every Write call the performance would drop significantly.

A few things you could do to improve the performance for you is you could set the initial capacity to be at least the size of your compressed array and you could then increase size by a factor smaller than 2.0 to reduce the amount of memory you are using.

const double ResizeFactor = 1.25;  private byte[] Decompress(byte[] data) {     using (MemoryStream compressedStream = new MemoryStream(data))     using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))     using (MemoryStream resultStream = new MemoryStream(data.Length * ResizeFactor)) //Set the initial size to be the same as the compressed size + 25%.     {         byte[] buffer = new byte[4096];         int iCount = 0;          while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)         {             if(resultStream.Capacity < resultStream.Length + iCount)                resultStream.Capacity = resultStream.Capacity * ResizeFactor; //Resize to 125% instead of 200%              resultStream.Write(buffer, 0, iCount);         }         return resultStream.ToArray();     } }

If you wanted to you could do even more fancy algorithms like resizing based on the current compression ratio

const double MinResizeFactor = 1.05;  private byte[] Decompress(byte[] data) {     using (MemoryStream compressedStream = new MemoryStream(data))     using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))     using (MemoryStream resultStream = new MemoryStream(data.Length * MinResizeFactor)) //Set the initial size to be the same as the compressed size + the minimum resize factor.     {         byte[] buffer = new byte[4096];         int iCount = 0;          while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)         {             if(resultStream.Capacity < resultStream.Length + iCount)             {                double sizeRatio = ((double)resultStream.Position + iCount) / (compressedStream.Position + 1); //The +1 is to prevent divide by 0 errors, it may not be necessary in practice.                 //Resize to minimum resize factor of the current capacity or the                 // compressed stream length times the compression ratio + min resize                 // factor, whichever is larger.                resultStream.Capacity =  Math.Max(resultStream.Capacity * MinResizeFactor,                                                   (sizeRatio + (MinResizeFactor - 1)) * compressedStream.Length);              }              resultStream.Write(buffer, 0, iCount);         }         return resultStream.ToArray();     } }

149

answered Sep 22 '22 04:09

Scott Chamberlain

Related questions
                            
                                Windows service with timer
                            
                                Best practice for constant string for implementations to use
                            
                                Xunit 2.3.0 Unable to pass dates as inline params
                            
                                Is there a performance hit for creating Extension methods that operate off the type 'object'?
                            
                                Can I have one Style with multiple TargetType in WPF?
                            
                                C# Enums with reserved keywords
                            
                                Is C# 6 ?. (Elvis op) thread safe? If so, how?
                            
                                In C#, how do I combine more than two parts of a file path at once?
                            
                                Design advice - When to use "virtual" and "sealed" effectively [closed]
                            
                                Entity Framework Best Practices In Business Logic?
                            
                                reading HttpwebResponse json response, C#
                            
                                How the right associative of null coalescing operator behaves?
                            
                                Setting Culture (en-IN) globally in WPF application
                            
                                Force application close on system shutdown
                            
                                Management of strings in structs
                            
                                Convert from scientific notation string to float in C#
                            
                                Linq: List of lists to a long list
                            
                                What is the fastest way to count the unique elements in a list of billion elements?
                            
                                Why can you assign Nothing to an Integer in VB.NET?
                            
                                Entity Framework Code First Lazy Loading

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does C# memory stream reserve so much memory?

Tags:

memory-management

c#

memory

memorystream

gzipstream

Tim Meyer

People also ask

1 Answers

Scott Chamberlain

Recent Activity

Donate For Us