Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Core Data memory usage while importing large dataset

I'm now stuck for about two weeks with a nasty Core Data problem. I read lots of blogpost, articles and SO questions/answers but I'm still not able to solve my problem.

I ran lots of tests and was able to reduce the larger problem to a smaller one. It's going to be a large explanation so keep with me!

Problem - datamodel

I have to got following datamodel:

Object A has one-to-many relation with object B which has another one-to-many relation with object C. Because of Core Data recommendations I have to create inverse relations so each instance of B points to its parent A and the same for C which points to its parent B.

A <->> B <->> C

Problem - MOC setup

To keep responsiveness smooth as butter I created a three-level managedObjectContext structure.

  1. Parent MOC - Runs on its own private thread using NSPrivateQueueConcurrencyType, is tight to the persistentStoreCoordinator
  2. MainQueue MOC - Runs on the mainThread using NSMainQueueConcurrencyType and has parent MOC 1
  3. For each parsing operation I create a third MOC which also has its private queue and has parent mainQueue MOC

My main datacontroller is added as an observer to the NSManagedObjectContextDidSave notification of MOC 2 so every time MOC 2 saves a performBlock: on MOC1 is triggered which performs a save operation (asynchronously because of performBlock:).

Problem - Parsing

To perform parsing a large JSON file into my Core Data structure I wrote a recurrent parser. This parser starts by creating a new MOC (3). It then takes the data for object A and parses its properties. Then the parser reads out the JSON relations for B and create the corresponding objects which are filled with data. These new objects are added to A by calling addBObject: on A. Because the parser is recurrent, parsing B means parsing C and here also new objects are created and attached to B. This all happens in the performBlock: on MOC 3.

  • Parse (creates 'A'-objects and starts parsing B)
    • Parsing A (creates 'B'-objects, attaches them to A and starts parsing C)
      • Parsing B (creates 'C'-objects, attaches them to B)
        • Parsing C (just stores data in a C-object)

After each parsing operation I save MOC 3 and dispatches on the mainThread a save operation of the main MOC (2). Because of the NSManagedObjectContextDidSave notification MOC 1 will autosave asynchronously.

        if (parsed){
            NSError *error = nil;
            if (![managedObjectContext save:&error])
                NSLog(@"Error while saving parsed data: %@", error);
        }else{
            // something went wrong, discard changes
            [managedObjectContext reset];
        }

        dispatch_async(dispatch_get_main_queue(), ^{                
            // save mainQueueManagedObjectContext
            [[HWOverallDataController sharedOverallDataController] saveMainThreadManagedObjectContext];
        });

To release my memory footprint and because I do not need to parsed data for now I am performing:

[a.managedObjectContext refreshObject:a mergeChanges:NO];

for each A I just parsed.

Because I need to parse about 10 A's which all have about 10 B's which have all about 10 C's a lot of managedObject's are generated.

Problem - Instruments

Everything works fine. The only thing is: when I turn on the Allocations tool I see unreleased A's, B's and C's. I don't get any useful information from their retainCounts or whatsoever. And because my actual problem regards a more complex dataModel the living objects become a serious memory problem. Can someone figure out what I'm doing wrong? Calling refreshObjects on the other managedObjectContexts with the correct managedObject does not work either. Only a hard reset seems to work but then I loose my pointers to living objects used by the UI.

Other solutions I tried

  • I tried creating unidirectional relations instead of bidirectional ones. This create a lot other problems which cause Core Data inconsistencies and weird behavior (such as dangling objects and Core Data generating 1-n relations instead of n-n relations (because the inverse relation is not known).

  • I tried refreshing each changed or inserted object when I retrieve a NSManagedObjectContextDidSave notification on any object

These both 'solutions' (which don't work by the way) seems also a bit hacky. This should not be the way to go. There should be a way of getting this to work without raising the memory footprint and by keeping the UI smooth, though?

- CodeDemo

http://cl.ly/133p073h2I0j

- Further Investigation

After refreshing every object ever used (which is tedious work) in the mainContext (after a mainSave) the object their sizes are reduced to 48 bytes. This indicates that the objects are all faulted, but that there is still a pointer left in memory. When we have about 40.000 objects which are all faulted there is still 1.920 MB in memory which is never released until the persistentManagedObjectContext is reset. And this is something we don't want to do because we loose every reference to any managedObject.

like image 310
Robin van Dijke Avatar asked Oct 29 '12 20:10

Robin van Dijke


2 Answers

Robin,

I have a similar problem which I solved differently than you have. In your case, you have a third, IMO, redundant MOC, the parent MOC. In my case, I let the two MOCs communicate, in an old school fashion, through the persistent store coordinator via the DidSave notifications. The new block oriented APIs make this much simpler and robust. This lets me reset the child MOCs. While you gain a performance advantage from your third MOC, it isn't that great of an advantage over the SQLite row cache which I exploit. Your path consumes more memory. Finally, I can, by tracking the DidSave notifications, trim items as they are created.

BTW, you are also probably suffering from a massive increase in the size of your MALLOC_TINY and MALLOC_SMALL VM regions. My trailing trimming algorithm lets the allocators reuse space sooner and, hence, retards the growth of these problematic regions. These regions are, in my experience, due to their large resident memory footprint a major cause for my app, Retweever, being killed. I suspect your app suffers the same fate.

When the memory warnings come, I call the below snippet:

[self.backgroundMOC performBlock: ^{ [self.backgroundMOC reset]; }];

[self.moc save];

[self.moc.registeredObjects trimObjects];

-[NSArray(DDGArray) trimObjects] just goes through an array and refreshes the object, thus trimming them.

In summary, Core Data appears to implement a copy on write algorithm for items that appear in many MOCs. Hence, you have things retained in unexpected ways. I focus upon breaking these connections after import to minimize my memory footprint. My system, due to the SQLite row cache, appears to performa acceptably well.

Andrew

like image 115
adonoho Avatar answered Oct 19 '22 04:10

adonoho


For every NSManagedObjectContext that you keep around for a specific purpose you are going to accumulate instances of NSManagedObject

A NSManagedObjectContext is just a piece of scratch note paper that you can instantiate at will and save if you wish to keep changes in the NSPersistentStore and then discard afterward.

For the parsing operations (layer 3) try creating a MOC for the op , do your parsing, save the MOC and then discard it afterwards.

It feels like you have at least one layer of MOC being held in strong references too many.

Basically ask the question for each of the MOC's. "Why am keeping this object and its associated children alive".

like image 27
Warren Burton Avatar answered Oct 19 '22 03:10

Warren Burton