Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thinking sphinx fuzzy search?

I am implementing sphinx search in my rails application.
I want to search with fuzzy on. It should search for spelling mistakes e.g if is enter search query charact*a*ristics, it should search for charact*e*ristics.

How should I implement this

like image 406
Pravin Avatar asked May 19 '11 09:05

Pravin


3 Answers

Sphinx doesn't naturally allow for spelling mistakes - it doesn't care if the words are spelled correctly or not, it just indexes them and matches them.

There's two options around this - either use thinking-sphinx-raspell to catch spelling errors by users when they search, and offer them the choice to search again with an improved query (much like Google does); or maybe use the soundex or metaphone morphologies so words are indexed in a way that accounts for how they sound. Search on this page for stemming, you'll find the relevant section. Also have a read of Sphinx's documentation on the matter as well.

I've no idea how reliable either option would be - personally, I'd opt for #1.

like image 102
pat Avatar answered Oct 11 '22 09:10

pat


By default, Sphinx does not pay any attention to wildcard searching using an asterisk character. You can turn it on, though:

development:
  enable_star: true
  # ... repeat for other environments

See http://pat.github.io/thinking-sphinx/advanced_config.html Wildcard/Star Syntax section.

like image 31
Anton Avatar answered Oct 11 '22 09:10

Anton


Yes, Sphinx generaly always uses the extended match modes.

There are the following matching modes available:

SPH_MATCH_ALL, matches all query words (default mode);
SPH_MATCH_ANY, matches any of the query words;
SPH_MATCH_PHRASE, matches query as a phrase, requiring perfect match;
SPH_MATCH_BOOLEAN, matches query as a boolean expression (see Section 5.2, “Boolean query syntax”);
SPH_MATCH_EXTENDED, matches query as an expression in Sphinx internal query language (see Section 5.3, “Extended query syntax”);
SPH_MATCH_EXTENDED2, an alias for SPH_MATCH_EXTENDED;
SPH_MATCH_FULLSCAN, matches query, forcibly using the "full scan" mode as below. NB, any query terms will be ignored, such that filters, filter-ranges and grouping will still be applied, but no text-matching.

SPH_MATCH_EXTENDED2 was used during 0.9.8 and 0.9.9 development cycle, when the internal matching engine was being rewritten (for the sake of additional functionality and better performance). By 0.9.9-release, the older version was removed, and SPH_MATCH_EXTENDED and SPH_MATCH_EXTENDED2 are now just aliases.

enable_star

Enables star-syntax (or wildcard syntax) when searching through prefix/infix indexes. >Optional, default is is 0 (do not use wildcard syntax), for compatibility with 0.9.7. >Known values are 0 and 1.

For example, assume that the index was built with infixes and that enable_star is 1. Searching should work as follows:

"abcdef" query will match only those documents that contain the exact "abcdef" word in them.
"abc*" query will match those documents that contain any words starting with "abc" (including the documents which contain the exact "abc" word only);
"*cde*" query will match those documents that contain any words which have "cde" characters in any part of the word (including the documents which contain the exact "cde" word only).
"*def" query will match those documents that contain any words ending with "def" (including the documents that contain the exact "def" word only).

Example:

enable_star = 1

like image 45
Pankhuri Avatar answered Oct 11 '22 10:10

Pankhuri