Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memorystream and Large Object Heap

I have to transfer large files between computers on via unreliable connections using WCF.

Because I want to be able to resume the file and I don't want to be limited in my filesize by WCF, I am chunking the files into 1MB pieces. These "chunk" are transported as stream. Which works quite nice, so far.

My steps are:

  1. open filestream
  2. read chunk from file into byte[] and create memorystream
  3. transfer chunk
  4. back to 2. until the whole file is sent

My problem is in step 2. I assume that when I create a memory stream from a byte array, it will end up on the LOH and ultimately cause an outofmemory exception. I could not actually create this error, maybe I am wrong in my assumption.

Now, I don't want to send the byte[] in the message, as WCF will tell me the array size is too big. I can change the max allowed array size and/or the size of my chunk, but I hope there is another solution.

My actual question(s):

  • Will my current solution create objects on the LOH and will that cause me problem?
  • Is there a better way to solve this?

Btw.: On the receiving side I simple read smaller chunks from the arriving stream and write them directly into the file, so no large byte arrays involved.

Edit:

current solution:

for (int i = resumeChunk; i < chunks; i++)
{
 byte[] buffer = new byte[chunkSize];
 fileStream.Position = i * chunkSize;
 int actualLength = fileStream.Read(buffer, 0, (int)chunkSize);
 Array.Resize(ref buffer, actualLength);
 using (MemoryStream stream = new MemoryStream(buffer)) 
 {
  UploadFile(stream);
 }
}
like image 757
flayn Avatar asked May 12 '10 13:05

flayn


2 Answers

I hope this is okay. It's my first answer on StackOverflow.

Yes absolutely if your chunksize is over 85000 bytes then the array will get allocated on the large object heap. You will probably not run out of memory very quickly as you are allocating and deallocating contiguous areas of memory that are all the same size so when memory fills up the runtime can fit a new chunk into an old, reclaimed memory area.

I would be a little worried about the Array.Resize call as that will create another array (see http://msdn.microsoft.com/en-us/library/1ffy6686(VS.80).aspx). This is an unecessary step if actualLength==Chunksize as it will be for all but the last chunk. So I would as a minimum suggest:

if (actualLength != chunkSize) Array.Resize(ref buffer, actualLength);

This should remove a lot of allocations. If the actualSize is not the same as the chunkSize but is still > 85000 then the new array will also be allocated on the Large object heap potentially causing it to fragment and possibly causing apparent memory leaks. It would I believe still take a long time to actually run out of memory as the leak would be quite slow.

I think a better implementation would be to use some kind of Buffer Pool to provide the arrays. You could roll your own (it would be too complicated) but WCF does provide one for you. I have rewritten your code slightly to take advatage of that:

BufferManager bm = BufferManager.CreateBufferManager(chunkSize * 10, chunkSize);

for (int i = resumeChunk; i < chunks; i++)
{
    byte[] buffer = bm.TakeBuffer(chunkSize);
    try
    {
        fileStream.Position = i * chunkSize;
        int actualLength = fileStream.Read(buffer, 0, (int)chunkSize);
        if (actualLength == 0) break;
        //Array.Resize(ref buffer, actualLength);
        using (MemoryStream stream = new MemoryStream(buffer))
        {
            UploadFile(stream, actualLength);
        }
    }
    finally
    {
        bm.ReturnBuffer(buffer);
    }
}

this assumes that the implementation of UploadFile Can be rewritten to take an int for the no. of bytes to write.

I hope this helps

joe

like image 91
Joe Simmonds Avatar answered Sep 16 '22 11:09

Joe Simmonds


See also RecyclableMemoryStream. From this article:

Microsoft.IO.RecyclableMemoryStream is a MemoryStream replacement that offers superior behavior for performance-critical systems. In particular it is optimized to do the following:

  • Eliminate Large Object Heap allocations by using pooled buffers
  • Incur far fewer gen 2 GCs, and spend far less time paused due to GC
  • Avoid memory leaks by having a bounded pool size
  • Avoid memory fragmentation
  • Provide excellent debuggability
  • Provide metrics for performance tracking
like image 44
Manushin Igor Avatar answered Sep 17 '22 11:09

Manushin Igor