Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Approximate searching against a list of strings

So yes, I read about how edit distance can be used between strings to decide how "close" are two string to each other. This algorithm, implemented as a Dynamic problem takes O(mn) time, where m and n are the lengths of the text and pattern respectively. So if I have to match a string against 5000 odd other strings, it's gonna take a LOT of time, which on my application is simply not acceptable. Is there are faster solution that can be implemented? I don't mind trading storage space for time.

I have seen an application called "Swype" on Android, which does something similar. It searches your query against it's own database and suggests results. How does that work so quickly?

Note: Please don't suggest frameworks like Lucene, because I cannot run then on J2ME.

like image 343
Gooner Avatar asked Jun 27 '11 05:06

Gooner


1 Answers

splix's answer is good. As another option (for very large string sets), you may want to consider using an n-gram representation:

http://en.wikipedia.org/wiki/N-gram

These are used for approximate pattern matching in a lot of database packages since they are fast and easy to implement using conventional indexing methodologies.

like image 199
Mikola Avatar answered Oct 03 '22 04:10

Mikola