Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Android & fuzzy matching, n-gram, and Levenshtein distance

I am building an Android app which takes a string input and returns a ranked list of books using the Google API.

I am looking for a way to compare the open ended string that the user enters, with the first item in the list to see if what they entered is 'likely' to be one book. I have loads of information about the book, title, author, description etc so I can search in any part.

An example is:

'eyre affair fforde', 'fforde eyre affair', 'the eyre affair'
----> 
'Likely' to be 'The Eyre Affair by Jasper Fforde'

What would be the best way to go about this? I have looked at levenshtein distance but don't think it would work with such open ended input, n-grams seem a good way to go, or fuzzy matching.

Any other ideas?

like image 358
Carrie Hall Avatar asked Feb 26 '23 00:02

Carrie Hall


1 Answers

I would go with one of these:

SimMetrics (SimMetrics is an open source extensible library of Similarity or Distance Metrics, e.g. Levenshtein Distance, L2 Distance, Cosine Similarity, Jaccard Similarity etc etc.)

Commons Lang LevenshteinDistance

Or to get rid of hearing or spelling mistakes: soundex or metaphone.

like image 184
Chris Avatar answered Mar 07 '23 01:03

Chris