Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Core data bulk insert suddenly slows to 1/10th the speed

I am bulk inserting into core data. I have a person object, and this person object has a relationship called "otherPeople" that is an NSSet of people. When bulk inserting data from a download, things were great until about 10,000 people are read in at which point the bulk insert speed slows down to a crawl. I am saving and resetting my NSManagedObjectContext every 500 inserts.

If I comment out the part that inserts the "otherPerson" relationships, the bulk insert is speedy through the entire download. parseJSON is called in batches of 500 JSONKit dictionaries.

Any ideas what might be causing this? Possible solutions?

Code:

- (NSArray*) getPeople:(NSArray*)ids
{
    NSFetchRequest* request = [[[NSFetchRequest alloc] init] autorelease];
    NSEntityDescription* entityDescription = [NSEntityDescription entityForName:@"Person" inManagedObjectContext:context];
    [request setEntity:entityDescription];
    [request setFetchBatchSize:ids.count];

    //Filter by array of ids
    NSPredicate* predicate = [NSPredicate predicateWithFormat:@"externalId IN %@", ids];
    [request setPredicate:predicate];

    NSError* _error;
    NSArray* people = [context executeFetchRequest:request error:&_error];

    return people;
}

- (void) parseJSON:(NSArray*)people
{
    NSAutoreleasePool* pool = [[NSAutoreleasePool alloc] init];
    NSMutableArray* idsToFetch = [NSMutableSet setWithCapacity:CHUNK_SIZE * 3];
    NSMutableDictionary* existingPeople = [NSMutableDictionary dictionaryWithCapacity:CHUNK_SIZE * 3];

    // populate the existing people dictionary first, that way we know who is already in the context without having to do a fetch for each person in the array (externalId IS indexed)
    for (NSDictionary* personDictionary in people)
    {
        // uses JSON kit to parse out all the external ids...
        [PersonJSON addExternalIdsToArray:idsToFetch fromDictionary:personDictionary];
    }

    // see above code for getPeople implementation...
    NSArray* existingPeopleArray = [self getPeople:idsToFetch];
    for (Person* p in existingPeopleArray)
    {
        [existingPeople setObject:p forKey:p.externalId];
    }

    for (NSDictionary* personDictionary in people)
    {
        NSString* externalId = [personDictionary objectForKey:@"PersonId"];
        Person* person = [existingPeople objectForKey:externalId];

        if (person == nil)
        {
            // the person was not in the context, make a new person in the context
            person = [[self newPerson] autorelease];
            person.ancestryId = externalId;
            [existingPeople setObject:person forKey:person.externalId];
        }

        // use JSON kit to populate the core data object...
        [PersonJSON populatePerson:person withDictionary:personDictionary inContext:[self context]];

        // these are just objects that contain an externalId, showing that the link hasn't been setup yet
        for (UnresolvedOtherPerson* other in person.unresolvedOtherPeople)
        {
            Person* relatedPerson = [existingPeople objectForKey:other.externalId];

            if (relatedPerson == nil)
            {
                relatedPerson = [[self newPerson] autorelease];
                relatedPerson.externalId = other.externalId;
                [existingPeople setObject:relatedPerson forKey:relatedPerson.externalId];
            }

            // add link - if I comment out this line, everything runs very fast
            // if I don't comment out, things slow down gradually and then exponentially
            [person addOtherPersonsObject:relatedPerson];
        }

        self.downloaded++;
    }

    [pool drain];
}
like image 782
jjxtra Avatar asked Oct 25 '22 02:10

jjxtra


1 Answers

adding object to relationship causes the relationship on both side to fire. So if you have A <<->> B and say you are trying to add a freshly created A object to a B object that already has relationship with 100,000 A objects, CoreData will fetch that 100,000 objects from the store to fulfill the relationship before adding a new relationship.

The fact that you are clearing the mangedobjectcontext every so often means that all 100,000 objects CD loaded to fulfill the relationship now needs to be reloaded all over again, making the process extremely slow.

One way to work around this problem is by doing a two-step import process. First get all the objects loaded into db without establishing any relationships, but do keep track of which relationship needs to be added. Once you do a fast import like this, then go back to the db and add the relationships and clear context in such a way to avoid core-data having to reload the relationships too often. So as a concrete example, if you need to import 1 million A's that needs to be associated with 100 B's, first import all the As, then for each of the hundred Bs, load the relationship once and add all the As to it, clear the context, move on to the next B, and so on. The key here is to prevent the context from reseting those 100k records that it just painfully loaded.

Another way to work around is to instead of resetting the whole context at regular intervals, only refresh the objects you want to get rid of.

Oh, one more thing, you could also consider having one-way relationship in CoreData, and use an explicit fetch to get the other side of the relationship

EDIT:

I think I found a workaround. You need to call the primitive accessors. so something like

        [self.primitiveTags addObject:tag];

Preliminary tests seems to show that this does not force the other side of the relationship to fire

like image 113
Tony Avatar answered Oct 30 '22 15:10

Tony