Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lightweight fuzzy search library

Tags:

fuzzy-search

Can you suggest some light weight fuzzy text search library?

What I want to do is to allow users to find correct data for search terms with typos.

I could use full-text search engines like Lucene, but I think it's an overkill.

Edit:
To make question more clear here is a main scenario for that library:
I have a large list of strings. I want to be able to search in this list (something like MSVS' intellisense) but it should be possible to filter this list by string which is not present in it but close enough to some string which is in the list.
Example:

  • Red
  • Green
  • Blue

When I type 'Gren' or 'Geen' in a text box, I want to see 'Green' in the result set.

Main language for indexed data will be English.

I think that Lucene is to heavy for that task.

Update:

I found one product matching my requirements. It's ShuffleText.
Do you know any alternatives?

like image 585
aku Avatar asked Sep 03 '08 15:09

aku


People also ask

What is Fusejs?

Fuse. js is a JavaScript library that provides fuzzy search capabilities for applications and websites. It's nice and easy to use out of the box, but also includes configuration options that allow you to tweak and create powerful solutions.

How fast is fuse js?

js takes 10+ seconds with semi-long queries. New! Save questions or answers and organize your favorite content.

What is fuzzy search?

A fuzzy search searches for text that matches a term closely instead of exactly. Fuzzy searches help you find relevant results even when the search terms are misspelled. To perform a fuzzy search, append a tilde (~) at the end of the search term.

What is fuzzy search JavaScript?

Fuzzy searching matches the meaning, not necessarily the precise wording or specified phrases. It performs something the same as full-text search against data to see likely misspellings and approximate string matching.


2 Answers

Lucene is very scalable—which means its good for little applications too. You can create an index in memory very quickly if that's all you need.

For fuzzy searching, you really need to decide what algorithm you'd like to use. With information retrieval, I use an n-gram technique with Lucene successfully. But that's a special indexing technique, not a "library" in itself.

Without knowing more about your application, it won't be easy to recommend a suitable library. How much data are you searching? What format is the data? How often is the data updated?

like image 137
erickson Avatar answered Oct 05 '22 14:10

erickson


I'm not sure how well Lucene is suited for fuzzy searching, the custom library would be better choice. For example, this search is done in Java and works pretty fast, but it is custom made for such task: http://www.softcorporation.com/products/people/

like image 34
Vadim Avatar answered Oct 05 '22 15:10

Vadim