Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reusing a CGContext causing odd performance losses

My class is rendering images offscreen. I thought reusing the CGContext instead of creating the same context again and again for every image would be a good thing. I set a member variable _imageContext so I would only have to create a new context if _imageContext is nil like so:

if(!_imageContext)
    _imageContext = [self contextOfSize:imageSize];

instead of:

CGContextRef imageContext = [self contextOfSize:imageSize];

Of course I do not release the CGContext anymore.

These are the only changes I made, turns out that reusing the context slowed down rendering from about 10ms to 60ms. Have I missed something? Do I have to clear the context or something before drawing into it again? Or is it the correct way to recreate the context for each image?

EDIT

Found the weirdest connection..

While I was searching for the reason why the app's memory is incredibly increasing when the app starts rendering the images, I found the problem was where I set the rendered image to an NSImageView.

imageView.image = nil;
imageView.image = [[NSImage alloc] initWithCGImage:_imageRef size:size];

It looks like ARC is not releasing the previous NSImage. First way to avoid that was to draw the new image into the old one.

[imageView.image lockFocus];
[[[NSImage alloc] initWithCGImage:_imageRef size:size] drawInRect:NSMakeRect(0, 0, size.width, size.height) fromRect:NSZeroRect operation:NSCompositeSourceOver fraction:1.0];
[imageView.image unlockFocus];
[imageView setNeedsDisplay];

The memory problem was gone and what happened to the CGContext-reuse problem? Not reusing the context now takes 20ms instead of 10ms - of course drawing into an image takes longer than just setting it. Reusing the context also takes 20ms instead of 60ms. But why? I don't see that there could be any connection, but I can reproduce the old state where reusing takes more time just by setting the NSImageView's image instead of drawing it.

like image 763
Enie Avatar asked Dec 28 '12 15:12

Enie


1 Answers

I investigated this, and I observe the same slowdown. Looking with Instruments set to sample kernel calls as well as userland calls shows the culprit. @RyanArtecona's comment was on the right track. I focused Instruments in on the bottom most userland call CGSColorMaskCopyARGB8888_sse in two test runs (one reusing contexts, the other making a new one every time), and then inverted the resulting call tree. In the case where the context is not reused, I see that the heaviest kernel trace is:

Running Time    Self            Symbol Name
668.0ms   32.3% 668.0           __bzero
668.0ms   32.3% 0.0              vm_fault
668.0ms   32.3% 0.0               user_trap
668.0ms   32.3% 0.0                CGSColorMaskCopyARGB8888_sse

This is the kernel zeroing out pages of memory that are being faulted in by virtue of CGSColorMaskCopyARGB8888_sse accessing them. What this means is that the CGContext maps VM pages to back the bitmap context but the kernel doesn't actually do the work associated with that operation until someone actually accesses that memory. The actual mapping/fault happens on first access.

Now let's look at the heaviest kernel trace when we DO reuse the context:

Running Time            Self            Symbol Name
1327.0ms   35.0%        1327.0          bcopy
1327.0ms   35.0%        0.0              user_trap
1327.0ms   35.0%        0.0               CGSColorMaskCopyARGB8888_sse

This is the kernel copying pages. My money would be on this being the underlying copy-on-write mechanism that delivers the behavior @RyanArtecona was talking about in his comment:

In the Apple docs for CGBitmapContextCreateImage, it says the actual bit-copying operation doesn't happen until more drawing is done on the original context.

In the contrived case I used to test, the non-reuse case took 3392ms to execute and the reuse case took 4693ms (significantly slower). Considering just the single heaviest trace from each case, the kernel trace indicates that we spend 668.0ms zero filling new pages on the first access, and 1327.0ms writing into the copy-on-write pages on the first write after the image gets a reference to those pages. This is a difference of 659ms. This one difference alone accounts for ~50% of the gap between the two cases.

So, to distill it down a little, the non-reused context is faster because when you create the context it knows the pages are empty, and there's no one else with a reference to those pages to force them to be copied when you write to them. When you reuse the context, the pages are referenced by someone else (the image you created) and must be copied on the first write, so as to preserve the state of the image when the state of the context changes.

You could further explore what's going on here by looking at the virtual memory map of the process as you step through in the debugger. vmmap is the helpful tool for that.

Practically speaking, you should probably just create a new CGContext every time.

like image 92
ipmcc Avatar answered Sep 25 '22 01:09

ipmcc