Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Core Data Multithreading Import (Duplicate Objects)

I have an NSOperationQueue that imports objects into Core Data that I get from a web api. Each operation has a private child managedObjectContext of my app's main managedObjectContext. Each operation takes the object to be imported and checks whether the object already exists in which case it updates the existing object. If the object doesn't exist it creates this new object. These changes on the private child contexts are then propagated up to the main managed object context.

This setup has worked very well for me, but there is a duplicates issue.

When I've got the same object being imported in two different concurrent operations I get duplicate objects that have the exact same data. (They both check to see if the object exists, and it doesn't appear to them to already exist). The reason i'll have 2 of the same objects importing at around the same time is that I'll often be processing a "new" api call as well as a "get" api call. Due to the concurrently asynchronous nature of my setup, it's hard to ensure that I won't ever have duplicate objects attempting to import.

So my question is what is the best way to solve this particular issue? I thought about limiting imports to max concurrent operations to 1 (I don't like this because of performance). Similarly I've considering requiring a save after every import operation and trying to handle merging of contexts. Also, i've considered grooming the data afterwards to occasionally clean up duplicates. And finally, i've considered just handling the duplicates on all fetch requests. But none of these solutions seem great to me, and perhaps there is an easy solution I've over looked.

like image 650
hatunike Avatar asked Aug 14 '13 21:08

hatunike


3 Answers

So the problem is:

  • contexts are a scratchpad — unless and until you save, changes you make in them are not pushed to the persistent store;
  • you want one context to be aware of changes made on another that hasn't yet been pushed.

To me it doesn't sound like merging between contexts is going to work — contexts are not thread safe. Therefore for a merge to occur nothing else can be ongoing on the thread/queue of the other context. You're therefore never going to be able to eliminate the risk that a new object is inserted while another context is partway through its insertion process.

Additional observations:

  • SQLite is not thread safe in any practical sense;
  • hence all trips to the persistent store are serialised regardless of how you issue them.

Bearing in mind the problem and the SQLite limitations, in my app we've adopted a framework whereby the web calls are naturally concurrent as per NSURLConnection, subsequent parsing of the results (JSON parsing plus some fishing into the result) occurs concurrently and then the find-or-create step is channeled into a serial queue.

Very little processing time is lost by the serialisation because the SQLite trips would be serialised anyway, and they're the overwhelming majority of the serialised stuff.

like image 153
Tommy Avatar answered Nov 15 '22 20:11

Tommy


Start by creating dependences between your operations. Make sure one can't complete until its dependency does.

Check out http://developer.apple.com/library/mac/documentation/Cocoa/Reference/NSOperation_class/Reference/Reference.html#//apple_ref/occ/instm/NSOperation/addDependency:

Each operation should call save when it finished. Next, I would try the Find-Or-Create methodology suggested here:

https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/CoreData/Articles/cdImporting.html

It'll solve your duplicates problem, and can probably result in you doing less fetches (which are expensive and slow, thus drain battery quickly).

You could also create a global child context to handle all of your imports, then merge the whole huge thing at the end, but it really comes down to how big the data set is and your memory considerations.

like image 38
Jeremy Massel Avatar answered Nov 15 '22 19:11

Jeremy Massel


I've been struggling with the same issue for a while now. The discussion on this question so far has given me a few ideas, which I will share now.

Please note that this is essentially untested since in my case I only see this duplicate issue very rarely during testing and there's no obvious way for me to reproduce it easily.

I have the same CoreData stack setup - A master MOC on a private queue, which has a child on the main queue and it used as the app's main context. Finally, bulk import operations (find-or-create) are passed off onto a third MOC using a background queue. Once the operation is complete saves are propagated up to the PSC.

I've moved all my Core Data stack from the AppDelegate to a separate class (AppModel) that provides the app with access to the aggregate root object of the domain (the Player) and also a helper function for performing background operations on the model (performBlock:onSuccess:onError:).

Luckily for me, all the major CoreData operations are funnelled through this method so if I can ensure that these operations are run serially then the duplicate problem should be solved.

- (void) performBlock: (void(^)(Player *player, NSManagedObjectContext *managedObjectContext)) operation onSuccess: (void(^)()) successCallback onError:(void(^)(id error)) errorCallback
{
    //Add this operation to the NSOperationQueue to ensure that 
    //duplicate records are not created in a multi-threaded environment
    [self.operationQueue addOperationWithBlock:^{

        NSManagedObjectContext *managedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
        [managedObjectContext setUndoManager:nil];
        [managedObjectContext setParentContext:self.mainManagedObjectContext];

        [managedObjectContext performBlockAndWait:^{

            //Retrive a copy of the Player object attached to the new context
            id player = [managedObjectContext objectWithID:[self.player objectID]];
            //Execute the block operation
            operation(player, managedObjectContext);

            NSError *error = nil;
            if (![managedObjectContext save:&error])
            {
                //Call the error handler
                dispatch_async(dispatch_get_main_queue(), ^{
                    NSLog(@"%@", error);
                    if(errorCallback) return errorCallback(error);
                });
                return;
            }

            //Save the parent MOC (mainManagedObjectContext) - WILL BLOCK MAIN THREAD BREIFLY
            [managedObjectContext.parentContext performBlockAndWait:^{
                NSError *error = nil;
                if (![managedObjectContext.parentContext save:&error])
                {
                    //Call the error handler
                    dispatch_async(dispatch_get_main_queue(), ^{
                        NSLog(@"%@", error);
                        if(errorCallback) return errorCallback(error);
                    });
                    return;
                }
            }];

            //Attempt to clear any retain cycles created during operation
            [managedObjectContext reset];

            //Call the success handler
            dispatch_async(dispatch_get_main_queue(), ^{
                if (successCallback) return successCallback();
            });
        }];
    }];
}

What I've added here that I hope is going to resolve the issue for me is wrapping the whole thing in addOperationWithBlock. My operation queue is simply configured as follows:

single.operationQueue = [[NSOperationQueue alloc] init];
[single.operationQueue setMaxConcurrentOperationCount:1];

In my API class, I might perform an import on my operation as follows:

- (void) importUpdates: (id) methodResult onSuccess: (void (^)()) successCallback onError: (void (^)(id error)) errorCallback
{
    [_model performBlock:^(Player *player, NSManagedObjectContext *managedObjectContext) {
        //Perform bulk import for data in methodResult using the provided managedObjectContext
    } onSuccess:^{
        //Call the success handler
        dispatch_async(dispatch_get_main_queue(), ^{
            if (successCallback) return successCallback();
        });
    } onError:errorCallback];
}

Now with the NSOperationQueue in place it should no longer be possible for more than one batch operation to take place at the same time.

like image 2
djskinner Avatar answered Nov 15 '22 19:11

djskinner