Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MemoryStream usage leads to out of memory exception

I'am facing issues when using MemoryStream multiple times.

Example:

For Each XImage As XImage In pdfDocument.Pages(pageCount).Resources.Images
   Dim imageStream As New MemoryStream()
   XImage.Save(imageStream, System.Drawing.Imaging.ImageFormat.Jpeg)

   ' some further processing

   imageStream.Close()
   imageStream.Dispose()    
Next

This piece of code cycles through images on a page of PDF file. The file may have up to cca 500 pages, lets say 5 images on each page. It leads to thousands of iterations. The issue is that the MemoryStream is not freed and it lead to Out of Memory exceptions. The XImage has usually around 250 kB.

I'm using Aspose.PDF library here to work with PDF (XImage is a class from this library), but it does not matter. I tried to make a simple example where I just create a new MemoryStream and save a dummy bitmap to it. It leads to same issues.

I also tried to use FileStream rather than MemoryStream but it behaves the same.

Any help appreciated.

Thanks

Jiri

like image 955
Jiri Matejka Avatar asked Jun 25 '13 20:06

Jiri Matejka


1 Answers

The memory from the stream is freed. I promise you. Really, it is.

What is not freed is the address space in your application formerly occupied by that memory. There's plenty of ram available to your computer, but your specific application crashes because it can't find a place within it's address table to allocate any more.

The reason you hit the limit is that the MemoryStream recycles its buffer as it grows. It uses a byte[] internally to hold its data, and the array is initialized to a certain size by default. As you write to the stream, if you exceed the size of your array the stream uses a doubling algorithm to allocate new arrays. Information is then copied from the old array to new. After this, the old array can and will be collected, but it will not be compacted (think: defragged). The result is holes in your program's virtual address table that are no longer big enough for your MemoryStream buffer. One MemoryStream might use several arrays, resulting in several memory holes worth a total address space potentially much larger than the source data.

AFAIK, there is no way at this time to force the garbage collector to compact your memory address space. The solution therefore is to allocate a big block that can handle your largest image, and then reuse the same block over and over, so you don't end up with memory addresses that can't be reached.

For this code, that means creating the MemoryStream outside of the loop, and passing an integer to the constructor so that it is initialized to a reasonable number of bytes. You'll find this also gives you a nice performance boost, as your application suddenly no longer spends time frequently copying data from one byte array to another, meaning this is the better option even if you could compact your address table:

Using imageStream As New MemoryStream(307200) 'start at 300K... gives you some breathing room for larger images
    For Each XImage As XImage In pdfDocument.Pages(pageCount).Resources.Images

       'reset the stream, but keep using the same memory
       imageStream.Seek(0, SeekOrigin.Begin)
       imageStream.SetLength(0)

       XImage.Save(imageStream, System.Drawing.Imaging.ImageFormat.Jpeg)

       ' some further processing
   
    Next
End Using
like image 95
Joel Coehoorn Avatar answered Sep 21 '22 12:09

Joel Coehoorn