Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What (tf) are the secrets behind PDF memory allocation (CGPDFDocumentRef)

For a PDF reader I want to prepare a document by taking 'screenshots' of each page and save them to disc. First approach is

CGPDFDocumentRef document = CGPDFDocumentCreateWithURL((CFURLRef) someURL);
for (int i = 1; i<=pageCount; i++) 
{
  NSAutoreleasePool *pool = [[NSAutoreleasePool alloc]init];      
  CGPDFPageRef page = CGPDFDocumentGetPage(document, i);
  ...//getting + manipulating graphics context etc.
  ...
  CGContextDrawPDFPage(context, page);
  ...
  UIImage *resultingImage = UIGraphicsGetImageFromCurrentImageContext();
  ...//saving the image to disc 
  [pool drain];
}
CGPDFDocumentRelease(document);

This results in a lot of memory which seems not to be released after the first run of the loop (preparing the 1st document), but no more unreleased memory in additional runs:

MEMORY BEFORE:          6 MB
MEMORY DURING 1ST DOC: 40 MB
MEMORY AFTER 1ST  DOC: 25 MB 
MEMORY DURING 2ND DOC: 40 MB
MEMORY AFTER 2ND  DOC: 25 MB
....

Changing the code to

for (int i = 1; i<=pageCount; i++) 
{
  CGPDFDocumentRef document = CGPDFDocumentCreateWithURL((CFURLRef) someURL);
  NSAutoreleasePool *pool = [[NSAutoreleasePool alloc]init];      
  CGPDFPageRef page = CGPDFDocumentGetPage(document, i);
  ...//getting + manipulating graphics context etc.
  ...
  CGContextDrawPDFPage(context, page);
  ...
  UIImage *resultingImage = UIGraphicsGetImageFromCurrentImageContext();
  ...//saving the image to disc 
  CGPDFDocumentRelease(document);
  [pool drain];
}

changes the memory usage to

MEMORY BEFORE:          6 MB
MEMORY DURING 1ST DOC:  9 MB
MEMORY AFTER 1ST  DOC:  7 MB 
MEMORY DURING 2ND DOC:  9 MB
MEMORY AFTER 2ND  DOC:  7 MB
....

but is obviously a step backwards in performance.

When I start reading a PDF (later in time, different thread) in the first case no more memory is allocated (staying at 25 MB), while in the second case memory goes up to 20 MB (from 7).

In both cases, when I remove the CGContextDrawPDFPage(context, page); line memory is (nearly) constant at 6 MB during and after all preparations of documents.

Can anybody explain whats going on there?

like image 755
Kai Huppmann Avatar asked Jan 12 '11 12:01

Kai Huppmann


1 Answers

CGPDFDocument caches pretty aggressively and you have very little control over that, apart from - as you've done - releasing the document and reloading it from disk.

The reason you don't see a lot of allocations when you remove the CGContextDrawPDFPage call is that Quartz loads page resources lazily. When you just call CGPDFDocumentGetPage, all that happens is that it loads some basic metadata, like bounding boxes and annotations (very small in memory).

Fonts, images, etc. are only loaded when you actually draw the page - but then they're retained for a relatively long time in an internal cache. This is meant to make rendering faster, because page resources are often shared between multiple pages. Also, it's fairly common to render a page multiple times (e.g. when zooming in). You'll notice that it's significantly faster to render a page the second time.

like image 120
omz Avatar answered Oct 02 '22 10:10

omz