How to optimize Core Data query for full text search

Tags:

Can I optimize a Core Data query when searching for matching words in a text? (This question also pertains to the wisdom of custom SQL versus Core Data on an iPhone.)

I'm working on a new (iPhone) app that is a handheld reference tool for a scientific database. The main interface is a standard searchable table view and I want as-you-type response as the user types new words. Words matches must be prefixes of words in the text. The text is composed of 100,000s of words.

In my prototype I coded SQL directly. I created a separate "words" table containing every word in the text fields of the main entity. I indexed words and performed searches along the lines of

SELECT id, * FROM textTable 
  JOIN (SELECT DISTINCT textTableId FROM words 
         WHERE word BETWEEN 'foo' AND 'fooz' ) 
    ON id=textTableId
 LIMIT 50

This runs very fast. Using an IN would probably work just as well, i.e.

SELECT * FROM textTable
 WHERE id IN (SELECT textTableId FROM words 
               WHERE word BETWEEN 'foo' AND 'fooz' ) 
 LIMIT 50

The LIMIT is crucial and allows me to display results quickly. I notify the user that there are too many to display if the limit is reached. This is kludgy.

I've spent the last several days pondering the advantages of moving to Core Data, but I worry about the lack of control in the schema, indexing, and querying for an important query.

Theoretically an NSPredicate of textField MATCHES '.*\bfoo.*' would just work, but I'm sure it will be slow. This sort of text search seems so common that I wonder what is the usual attack? Would you create a words entity as I did above and use a predicate of "word BEGINSWITH 'foo'"? Will that work as fast as my prototype? Will Core Data automatically create the right indexes? I can't find any explicit means of advising the persistent store about indexes.

I see some nice advantages of Core Data in my iPhone app. The faulting and other memory considerations allow for efficient database retrievals for tableview queries without setting arbitrary limits. The object graph management allows me to easily traverse entities without writing lots of SQL. Migration features will be nice in the future. On the other hand, in a limited resource environment (iPhone) I worry that an automatically generated database will be bloated with metadata, unnecessary inverse relationships, inefficient attribute datatypes, etc.

Should I dive in or proceed with caution?

448

asked Nov 21 '09 03:11

dk.

2 Answers

I made a workaround solution. I think it's similar to this post. I added the amalgamation source code to my Core Data project, then created a full-text search class that was not a managed object subclass. In the FTS class I #import "sqlite3.h" (the source file) instead of the sqlite framework. The FTS class saves to a different .sqlite file than the Core Data persistent store.

When I import my data, the Core Data object stores the rowid of the related FTS object as an integer attribute. I have a static dataset, so I don't worry about referential integrity, but the code to maintain integrity should be trivial.

To perform FTS, I MATCH query the FTS class, returning a set of rowids. In my managed object class, I query for the corresponding objects with [NSPredicate predicateWithFormat:@"rowid IN %@", rowids]. I avoid traversing any many-to-many relationships this way.

The performance improvement is dramatic. My dataset is 142287 rows, comprising 194MB (Core Data) and 92MB (FTS with stopwords removed). Depending on the search term frequency, my searches went from several seconds to 0.1 seconds for infrequent terms (<100 hits) and 0.2 seconds for frequent terms (>2000 hits).

I'm sure there are myriad problems with my approach (code bloat, possible namespace collisions, loss of some Core Data features), but it seems to be working.

answered Oct 07 '22 22:10

jluckyiv

To follow up on this question, I've found that querying is dog slow using Core Data. I've scratched my head on this one for many hours.

As in the SQL example in my question, there are two entities: textTable and words where words contains each word, it is indexed, and there is a many-to-many relationship between textTable and words. I populated the database with a mere 4000 words and 360 textTable objects. Suppose the textTable relationship to the words object is called searchWords, then I can use a predicate on the textTable entity that looks like

predicate = [NSPredicate predicateWithFormat:@"ANY searchWords.word BEGINSWITH %@", query];

(I can add conjunctions of this predicate for multiple query terms.)

On the iPhone this query takes multiple seconds. The response for my hand-coded SQL using a larger test set was instant.

But this isn't even the end of it. There are limitations to NSPredicate that make rather simple queries slow and complex. For example, imagine in the above example that you want to filter using a scope button. Suppose the words entity contains all words in all text fields, but the scope would limit it to words from specific fields. Thus, words might have a "source" attribute (e.g. header and message body of email).

Naturally, then, a full text would ignore the source attribute, as in the example above, but a filtered query would limit the search to a particular source value. This seemingly simple change requires a SUBQUERY. For example, this doesn't work:

ANY searchWords.word BEGINSWITH "foo" AND ANY searchWords.source = 3

because the entities that are true for the two expressions can be different. Instead, you have to do something like:

SUBQUERY(searchWords, $x, $x.word BEGINSWITH "foo" AND $x.source = 3).@count > 0

I've found that these subqueries are, perhaps not surprisingly, slower than predicates using "ANY".

At this point I'm very curious how Cocoa programmers efficiently use Core Data for full text search because I'm discouraged by both the speed of predicate evaluation and expressibility of NSPredicates. I've run up against a wall.

answered Oct 08 '22 00:10

dk.

Related questions
                            
                                Reposition legal label ( MKAttributionLabel )
                            
                                MoSync experiences
                            
                                UITableView delegate method called twice
                            
                                How to install previously-archived apps from xcode organizer to my iphone
                            
                                Difference between declaring static variable inside and outside the implementation context in Objective C? [duplicate]
                            
                                Can we define a private IBOutlet?
                            
                                iOS: How to achieve behavior like Android's startActivityForResult
                            
                                Wired connection from iPhone to Raspberry Pi
                            
                                Disappearing UITextField text
                            
                                How to update each custom tableview cell data at the same time?
                            
                                IPhone - UIView addSubview Gap at top
                            
                                Animating a shape with CoreAnimation
                            
                                Disabling data detectors for a specific HTML element in a UIWebView
                            
                                Common LISP on iPhone/iOS
                            
                                How to display 2 lines of text for subtitle of MKAnnotation and change the image for the button on the right?
                            
                                UIStoryboardPopoverSegue opening multiple windows on button touch
                            
                                If I'm happy with OS X running on VMware, do I really have to buy a Mac to be able to develop for App Store? [closed]
                            
                                How to stretch a UIImageView without stretching the center?
                            
                                UIActivityViewController is not working in iOS 11
                            
                                iPhone Device Debugging

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to optimize Core Data query for full text search

Tags:

sql

cocoa-touch

cocoa

iphone

core-data

dk.

People also ask

2 Answers

jluckyiv

dk.

Recent Activity

Donate For Us