I create an index like this:
curl --location --request PUT 'http://127.0.0.1:9200/test/' \
--header 'Content-Type: application/json' \
--data-raw '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"properties" : {
"word" : { "type" : "text" }
}
}
}'
when I create a document:
curl --location --request POST 'http://127.0.0.1:9200/test/_doc/' \
--header 'Content-Type: application/json' \
--data-raw '{ "word":"organic" }'
And finally, search with an intentionally misspelled word:
curl --location --request POST 'http://127.0.0.1:9200/test/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"suggest": {
"001" : {
"text" : "rganic",
"term" : {
"field" : "word"
}
}
}
}'
The word 'organic' lost the first letter - ES never gives suggestion options for such a mispell (works absolutely fine for any other misspells - 'orgnic', 'oragnc' and 'organi'). What am I missing?
The basic idea is to query Elasticsearch for a matching prefix of a word. A prefix is an affix which is placed before the stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix un- is added to the word happy, it creates the word unhappy. Source: wikipedia.org
Yes Elasticsearch can be used to build a spell checker. Here is the link for the same. How can I improve my English writing skills? First off, the fact that you want to improve is terrific!
But if you write the wrong word altogether, spell check won’t save you if that word is spelled correctly. If you type “their” instead of “there,” “you’re” instead of “your,” or, like this poor woman, “manure” instead of “mature,” it’s up to you to correct the error. Make sure you have these 16 spelling rules memorized by now.
App Search is built on Elasticsearch. Each App Search query is converted into a refined Elasticsearch query. Underneath App Search query suggestion and search queries are multi_match Elasticsearch queries. The type of multi_match query differs between a query suggestion and search query.
This is happening because of the prefix_length
parameter: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html . It defaults to 1, i.e. at least 1 letter from the beginning of the term has to match. You can set prefix_length
to 0 but this will have performance implications. Only your hardware, your setup and your dataset can show you exactly what those will be in practice in your case, i.e. try it :). However, be careful - Elasticsearch and Lucene devs set the default to 1 for a reason.
Here's a query which for me returns the suggestion result you're after on Elasticsearch 7.4.0 after I perform your setup steps.
curl --location --request POST 'http://127.0.0.1:9200/test/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"suggest": {
"001" : {
"text" : "rganic",
"term" : {
"field" : "word",
"prefix_length": 0
}
}
}
}'
You need to use the CANDIDATE GENERATORS with phrase suggester check this out from Elasticsearch in Action book page 444
Having multiple generators and filters lets you do some neat tricks. For instance, if typos are likely to happen both at the beginning and end of words, you can use multi- ple generators to avoid expensive suggestions with low prefix lengths by using the reverse token filter, as shown in figure F.4. You’ll implement what’s shown in figure F.4 in listing F.4: ■ First, you’ll need an analyzer that includes the reverse token filter.
■ Then you’ll index the correct product description in two fields: one analyzed with the standard analyzer and one with the reverse analyzer.
From Elasticsearch docs
The following example shows a phrase suggest call with two generators: the first one is using a field containing ordinary indexed terms, and the second one uses a field that uses terms indexed with a reverse filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The
pre_filter
andpost_filter
options accept ordinary analyzer names.
So you can achieve this by using the reverse
analyzer with the post-filter
and pre-filter
And as you can see they said:
This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions.
Check this Figure from Elasticsearch In Action book I believe it will make the idea more clear.
A screenshot from the book explains how elastic search will give us the correct phrase
For more information refer to the docs https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html#:~:text=The%20phrase%20suggester%20uses%20candidate,individual%20term%20in%20the%20text.
If explained the full idea then this will be a very long answer but I gave you the key and you can go and do your research about using the phrase suggester with multiple generators.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With