Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fuzziness settings in ElasticSearch

Need a way for my search engine to handle small typos in search strings and still return the right results.

According to the ElasticSearch docs, there are three values that are relevant to fuzzy matching in text queries: fuzziness, max_expansions, and prefix_length.

Unfortunately, there is not a lot of detail available on exactly what these parameters do, and what sane values for them are. I do know that fuzziness is supposed to be a float between 0 and 1.0, and the other two are integers.

Can anyone recommend reasonable "starting point" values for these parameters? I'm sure I will have to tune by trial and error, but I'm just looking for ballpark values to correctly handle typos and misspellings.

like image 614
Clay Wardell Avatar asked Aug 30 '12 17:08

Clay Wardell


1 Answers

According to the Fuzzy Query doc, default values are 0.5 for min_similarity (which looks like your fuzziness option), "unbounded" for max_expansions and 0 for prefix_length.

This answer should help you understand the min_similarity option. 0.5 seems to be a good start.

prefix_length and max_expansions will affect performance: you can try and develop with the default values, but be sure it will not scale (lucene developers were even considering setting a default value of 2 for prefix_length). I would recommend to run benchmarks to find the right values for your specific case.

like image 171
A21z Avatar answered Sep 24 '22 04:09

A21z