I work on an iPad application that has a sync process that uses web services and Core Data in a tight loop. To reduce the memory footprint according to Apple's Recomendation I allocate and drain an NSAutoreleasePool
periodically. This currently works great and there are no memory issues with the current application. However, I plan on moving to ARC where the NSAutoreleasePool
is no longer valid and would like to maintain this same kind of performance. I created a few examples and timed them and I am wondering what is the best approach, using ARC, to acheive the same kind of performance and maintain code readability.
For testing purposes I came up with 3 scenarios, each create a string using a number between 1 and 10,000,000. I ran each example 3 times to determine how long they took using a Mac 64 bit application with the Apple LLVM 3.0 compiler (w/o gdb -O0) and XCode 4.2. I also ran each example through instruments to see roughly what the memory peak was.
Each of the examples below are contained within the following code block:
int main (int argc, const char * argv[])
{
@autoreleasepool {
NSDate *now = [NSDate date];
//Code Example ...
NSTimeInterval interval = [now timeIntervalSinceNow];
printf("Duration: %f\n", interval);
}
}
NSAutoreleasePool Batch [Original Pre-ARC] (Peak Memory: ~116 KB)
static const NSUInteger BATCH_SIZE = 1500;
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
if((count + 1) % BATCH_SIZE == 0)
{
[pool drain];
pool = [[NSAutoreleasePool alloc] init];
}
}
[pool drain];
Run Times:
10.928158
10.912849
11.084716
Outer @autoreleasepool (Peak Memory: ~382 MB)
@autoreleasepool {
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
Run Times:
11.489350
11.310462
11.344662
Inner @autoreleasepool (Peak Memory: ~61.2KB)
for(uint32_t count = 0; count < MAX_ALLOCATIONS; count++)
{
@autoreleasepool {
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
Run Times:
14.031112
14.284014
14.099625
@autoreleasepool w/ goto (Peak Memory: ~115KB)
static const NSUInteger BATCH_SIZE = 1500;
uint32_t count = 0;
next_batch:
@autoreleasepool {
for(;count < MAX_ALLOCATIONS; count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
if((count + 1) % BATCH_SIZE == 0)
{
count++; //Increment count manually
goto next_batch;
}
}
}
Run Times:
10.908756
10.960189
11.018382
The goto
statement offered the closest performance, but it uses a goto
. Any thoughts?
Update:
Note: The goto
statement is a normal exit for an @autoreleasepool as stated in the documentation and will not leak memory.
On entry, an autorelease pool is pushed. On normal exit (break, return, goto, fall-through, and so on) the autorelease pool is popped. For compatibility with existing code, if exit is due to an exception, the autorelease pool is not popped.
The following should achieve the same thing as the goto
answer without the goto
:
for (NSUInteger count = 0; count < MAX_ALLOCATIONS;)
{
@autoreleasepool
{
for (NSUInteger j = 0; j < BATCH_SIZE && count < MAX_ALLOCATIONS; j++, count++)
{
NSString *text = [NSString stringWithFormat:@"%u", count + 1U];
[text class];
}
}
}
Note that ARC enables significant optimizations which are not enabled at -O0
. If you're going to measure performance under ARC, you must test with optimizations enabled. Otherwise, you'll be measuring your hand-tuned retain/release placement against ARC's "naive mode".
Run your tests again with optimizations and see what happens.
Update: I was curious, so I ran it myself. These are the runtime results in Release mode (-Os), with 7,000,000 allocations.
arc-perf[43645:f803] outer: 8.1259
arc-perf[43645:f803] outer: 8.2089
arc-perf[43645:f803] outer: 9.1104
arc-perf[43645:f803] inner: 8.4817
arc-perf[43645:f803] inner: 8.3687
arc-perf[43645:f803] inner: 8.5470
arc-perf[43645:f803] withGoto: 7.6133
arc-perf[43645:f803] withGoto: 7.7465
arc-perf[43645:f803] withGoto: 7.7007
arc-perf[43645:f803] non-ARC: 7.3443
arc-perf[43645:f803] non-ARC: 7.3188
arc-perf[43645:f803] non-ARC: 7.3098
And the memory peaks (only run with 100,000 allocations, because Instruments was taking forever):
Outer: 2.55 MB
Inner: 723 KB
withGoto: ~747 KB
Non-ARC: ~748 KB
These results surprise me a little. Well, the memory peak results don't; it's exactly what you'd expect. But the run time difference between inner
and withGoto
, even with optimizations enabled, is higher than what I would anticipate.
Of course, this is somewhat of a pathological micro-test, which is very unlikely to model real-world performance of any application. The takeaway here is that ARC may indeed some amount of overhead, but you should always measure your actual application before making assumptions.
(Also, I tested @ipmcc's answer using nested for loops; it behaved almost exactly like the goto
version.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With