Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch spell check suggestions even if first letter missed

I create an index like this:

curl --location --request PUT 'http://127.0.0.1:9200/test/' \
--header 'Content-Type: application/json' \
--data-raw '{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "properties" : {
            "word" : { "type" : "text" }
        }
    }
}'

when I create a document:

curl --location --request POST 'http://127.0.0.1:9200/test/_doc/' \
--header 'Content-Type: application/json' \
--data-raw '{ "word":"organic" }'

And finally, search with an intentionally misspelled word:

curl --location --request POST 'http://127.0.0.1:9200/test/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
  "suggest": {
    "001" : {
      "text" : "rganic",
      "term" : {
        "field" : "word"
      }
    }
  }
}'

The word 'organic' lost the first letter - ES never gives suggestion options for such a mispell (works absolutely fine for any other misspells - 'orgnic', 'oragnc' and 'organi'). What am I missing?

like image 708
Maksim Avatar asked Dec 30 '19 07:12

Maksim


People also ask

How do I query ElasticSearch for a matching prefix?

The basic idea is to query Elasticsearch for a matching prefix of a word. A prefix is an affix which is placed before the stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix un- is added to the word happy, it creates the word unhappy. Source: wikipedia.org

Can Elasticsearch be used to build a spell checker?

Yes Elasticsearch can be used to build a spell checker. Here is the link for the same. How can I improve my English writing skills? First off, the fact that you want to improve is terrific!

What happens if you write the wrong word in spell check?

But if you write the wrong word altogether, spell check won’t save you if that word is spelled correctly. If you type “their” instead of “there,” “you’re” instead of “your,” or, like this poor woman, “manure” instead of “mature,” it’s up to you to correct the error. Make sure you have these 16 spelling rules memorized by now.

What type of Elasticsearch is used in app search?

App Search is built on Elasticsearch. Each App Search query is converted into a refined Elasticsearch query. Underneath App Search query suggestion and search queries are multi_match Elasticsearch queries. The type of multi_match query differs between a query suggestion and search query.


2 Answers

This is happening because of the prefix_length parameter: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html . It defaults to 1, i.e. at least 1 letter from the beginning of the term has to match. You can set prefix_length to 0 but this will have performance implications. Only your hardware, your setup and your dataset can show you exactly what those will be in practice in your case, i.e. try it :). However, be careful - Elasticsearch and Lucene devs set the default to 1 for a reason.

Here's a query which for me returns the suggestion result you're after on Elasticsearch 7.4.0 after I perform your setup steps.

curl --location --request POST 'http://127.0.0.1:9200/test/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
  "suggest": {
    "001" : {
      "text" : "rganic",
      "term" : {
        "field" : "word",
        "prefix_length": 0
      }
    }
  }
}'
like image 110
Emanuil Tolev Avatar answered Oct 21 '22 07:10

Emanuil Tolev


You need to use the CANDIDATE GENERATORS with phrase suggester check this out from Elasticsearch in Action book page 444

Having multiple generators and filters lets you do some neat tricks. For instance, if typos are likely to happen both at the beginning and end of words, you can use multi- ple generators to avoid expensive suggestions with low prefix lengths by using the reverse token filter, as shown in figure F.4. You’ll implement what’s shown in figure F.4 in listing F.4: ■ First, you’ll need an analyzer that includes the reverse token filter.

■ Then you’ll index the correct product description in two fields: one analyzed with the standard analyzer and one with the reverse analyzer.

From Elasticsearch docs

The following example shows a phrase suggest call with two generators: the first one is using a field containing ordinary indexed terms, and the second one uses a field that uses terms indexed with a reverse filter (tokens are index in reverse order). This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions. The pre_filter and post_filter options accept ordinary analyzer names.

So you can achieve this by using the reverse analyzer with the post-filter and pre-filter

And as you can see they said:

This is used to overcome the limitation of the direct generators to require a constant prefix to provide high-performance suggestions.

Check this Figure from Elasticsearch In Action book I believe it will make the idea more clear.

A screenshot from the book explains how elastic search will give us the correct phrase

For more information refer to the docs https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html#:~:text=The%20phrase%20suggester%20uses%20candidate,individual%20term%20in%20the%20text.

If explained the full idea then this will be a very long answer but I gave you the key and you can go and do your research about using the phrase suggester with multiple generators.

like image 38
Talal Humaidi Avatar answered Oct 21 '22 08:10

Talal Humaidi