Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a bug I should submit to Apple, or is this expected behavior?

When using CoreData, the following multi-column index predicate is very slow - it takes almost 2 seconds for 26,000 records.

Please note both columns are indexed, and I am purposefully doing the query with > and <=, instead of beginswith, to make it fast:

NSPredicate *predicate = [NSPredicate predicateWithFormat:
  @"airportNameUppercase >= %@ AND airportNameUppercase < %@ \
        OR cityUppercase >= %@ AND cityUppercase < %@ \
    upperText, upperTextIncremented,
    upperText, upperTextIncremented];

However, if I run two separate fetchRequests, one for each column, and then I merge the results, then each fetchRequest takes just 1-2 hundredths of a second, and merging the lists (which are sorted) takes about 1/10th of a second.

Is this a bug in how CoreData handles multiple indices, or is this expected behavior? The following is my full, optimized code, which works very fast:

NSFetchRequest *fetchRequest = [[[NSFetchRequest alloc] init]autorelease];
[fetchRequest setFetchBatchSize:15]; 

// looking up a list of Airports
NSEntityDescription *entity = [NSEntityDescription entityForName:@"Airport" 
                                          inManagedObjectContext:context];
[fetchRequest setEntity:entity];    

// sort by uppercase name
NSSortDescriptor *nameSortDescriptor = [[[NSSortDescriptor alloc] 
           initWithKey:@"airportNameUppercase" 
             ascending:YES 
              selector:@selector(compare:)] autorelease];
NSArray *sortDescriptors = [[[NSArray alloc] initWithObjects:nameSortDescriptor, nil]autorelease];
[fetchRequest setSortDescriptors:sortDescriptors];

// use > and <= to do a prefix search that ignores locale and unicode,
// because it's very fast   
NSString *upperText = [text uppercaseString];
unichar c = [upperText characterAtIndex:[text length]-1];
c++;    
NSString *modName = [[upperText substringToIndex:[text length]-1]
                         stringByAppendingString:[NSString stringWithCharacters:&c length:1]];

// for the first fetch, we look up names and codes
// we'll merge these results with the next fetch for city name
// because looking up by name and city at the same time is slow
NSPredicate *predicate = [NSPredicate predicateWithFormat:
   @"airportNameUppercase >= %@ AND airportNameUppercase < %@ \
                        OR iata == %@ \
                        OR icao ==  %@",
     upperText, modName,
     upperText,
     upperText,
     upperText];
[fetchRequest setPredicate:predicate];

NSArray *nameArray = [context executeFetchRequest:fetchRequest error:nil];

// now that we looked up all airports with names beginning with the prefix
// look up airports with cities beginning with the prefix, so we can merge the lists
predicate = [NSPredicate predicateWithFormat:
  @"cityUppercase >= %@ AND cityUppercase < %@",
    upperText, modName];
[fetchRequest setPredicate:predicate];
NSArray *cityArray = [context executeFetchRequest:fetchRequest error:nil];

// now we merge the arrays
NSMutableArray *combinedArray = [NSMutableArray arrayWithCapacity:[cityArray count]+[nameArray count]];
int cityIndex = 0;
int nameIndex = 0;
while(   cityIndex < [cityArray count] 
      || nameIndex < [nameArray count]) {

  if (cityIndex >= [cityArray count]) {
    [combinedArray addObject:[nameArray objectAtIndex:nameIndex]];
    nameIndex++;
  } else if (nameIndex >= [nameArray count]) {
    [combinedArray addObject:[cityArray objectAtIndex:cityIndex]];
    cityIndex++;
  } else if ([[[cityArray objectAtIndex:cityIndex]airportNameUppercase] isEqualToString: 
                         [[nameArray objectAtIndex:nameIndex]airportNameUppercase]]) {
    [combinedArray addObject:[cityArray objectAtIndex:cityIndex]];
    cityIndex++;
    nameIndex++;
  } else if ([[cityArray objectAtIndex:cityIndex]airportNameUppercase] < 
                         [[nameArray objectAtIndex:nameIndex]airportNameUppercase]) {
    [combinedArray addObject:[cityArray objectAtIndex:cityIndex]];
    cityIndex++;
  } else if ([[cityArray objectAtIndex:cityIndex]airportNameUppercase] > 
                         [[nameArray objectAtIndex:nameIndex]airportNameUppercase]) {
    [combinedArray addObject:[nameArray objectAtIndex:nameIndex]];
    nameIndex++;
  }

}

self.airportList = combinedArray;
like image 653
Andrew Johnson Avatar asked Feb 25 '23 23:02

Andrew Johnson


2 Answers

CoreData has no affordance for the creation or use of multi-column indices. This means that when you execute the query corresponding to your multi-property predicate, CoreData can only use one index to make the selection. Subsequently it uses the index for one of the property tests, but then SQLite can't use an index to gather matches for the second property, and therefore has to do it all in memory instead of using its on-disk index structure.

That second phase of the select ends up being slow because it has to gather all the results into memory from the disk, then make the comparison and drop results in-memory. So you end up doing potentially more I/O than if you could use a multi-column index.

This is why, if you will be disqualifying a lot of potential results in each column of your predicate, you'll see much faster results by doing what you're doing and making two separate fetches and merging in-memory than you would if you made one fetch.

To answer your question, this behavior isn't unexpected by Apple; it's just an effect of a design decision to not support multi-column indices in CoreData. But you should to file a bug at https://feedbackassistant.apple.com/ requesting support of multi-column indices if you'd like to see that feature in the future.

In the meantime, if you really want to get max database performance on iOS, you could consider using SQLite directly instead of CoreData.

like image 125
Ryan Avatar answered May 14 '23 00:05

Ryan


When in doubt, you should file a bug.

There isn't currently any API to instruct Core Data to create a compound index. If a compound index were to exist, it would be used without issue.

Non-indexed columns are not processed entirely in memory. They result in a table scan, which isn't the same thing as loading the entire file (well, unless your file only has 1 table). Table scans on strings tend to be very slow.

SQLite itself is limited in the number of indices it will used per query. Basically just 1, give or take some circumstances.

You should use the [n] flag for this query to do a binary search against normalized text. There is a sample project on ADC called 'DerivedProperty'. It will show how to normalize text so you can use binary collations as opposed to the default ICU integration for fancy localized Unicode aware text comparisons.

There's a much longer discussion about fast string searching in Core Data at https://devforums.apple.com/message/363871

like image 25
Ben Avatar answered May 14 '23 01:05

Ben