I'm already familiar with Elasticsearch's spell-checker and I can build a simple spell-checker using suggest API. The thing is, there is a kind of misspelled words, called "real-word" misspells. A real-word misspell happens when a mistake in writing a word's spell, creates another word that is present in the indexed data, so the lexical spell-checker misses to correct it because lexically the word IS correct.
For instance, consider the query "How to bell my laptop?".The user by "bell" meant "sell", but "bell" is present in indexed vocabulary. So the spell-checker leaves it to be.
The idea of finding and correcting the real-word spell mistakes is by using the frequency of indexed data n-grams. If the frequency of current n-gram is very low and on the other hand there is a very similar n-gram with high frequency in indexed data, the chances are we have a real-word misspell.
I wonder if there is a way to implement such spell-checker using elasticsearch API?
After I searched for a while I find out the implementation of such a thing is possible using phrase_suggester.
POST v2_201911/_search
{
"suggest": {
"text": "how to bell my laptop",
"simple_phrase": {
"phrase": {
"field": "content",
"gram_size": 2,
"real_word_error_likelihood": 0.95,
"direct_generator": [
{
"field": "content",
"suggest_mode": "always",
"prefix_length": 0,
"min_word_length": 1
}
],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
According to documentation :
real_word_error_likelihood :
The likelihood of a term being a misspelled even if the term exists in the dictionary. The default is 0.95, meaning 5% of the real words are misspelled.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With