Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fuzzy search in ElasticSearch doesn't work with spaces

I'm using the fuzzy search option in ElasticSearch. It's pretty cool.

But I came across an issue when doing search for values that have spaces. For example say I have two values:

"Pizza"
"Pineapple Pizza"

and I search for Pizza using this query:

        client.search({
            index: 'food_index',
            body: {
                query: {
                    fuzzy: {
                        name: {
                            value: "Pizza",
                            transpositions: true,
                        }
                    },
                }
            }
        })

The values returned are:

"Pizza"
"Pineapple Pizza"

Which is expected. But if I enter in the value "Pineapple Pizza" in my query:

        client.search({
            index: 'food_index',
            body: {
                query: {
                    fuzzy: {
                        name: {
                            value: "Pineapple Pizza",
                            transpositions: true,
                        }
                    },
                }
            }
        })

The values returned are:

""

Empty

Why is that? It should be an exact match. I'm contemplating switching all names that have spaces in them to underscores. So "Pineapple Pizza" would be "Pineapple_Pizza" (This solution works for me). But I'm asking this question as to hopefully finding a better alternative. What am I doing wrong here?

like image 536
JD333 Avatar asked Oct 26 '19 21:10

JD333


1 Answers

Fuzzy queries are term level queries. It means searched text is not analyzed before matching the documents. In your case standard analyzer is used on field name, which splits "Pineapple Pizza" in two tokens Pineapple and pizza. Fuzzy query is trying to match search text "Pineapple pizza" to any similar term in index and there is no entry in index for the whole word pineapple pizza(it is broken in two words.)

You need to use match query with fuzziness set to analyze query string

{
  "query": {
        "match" : {
            "item" : {
                "query" : "Pineappl piz",
                "fuzziness": "auto"
            }
        }
    }
}

Response :

 [
      {
        "_index" : "index27",
        "_type" : "_doc",
        "_id" : "p9qQDG4BLLIhDvFGnTMX",
        "_score" : 0.53372335,
        "_source" : {
          "item" : "Pineapple Pizza"
        }
      }
    ]

You can also use fuzziness on keyword field which stores entire text in index

{
  "query": {
    "fuzzy": {
      "item.keyword": {
        "value":"Pineapple pizz"
      }
    }
  }
}

EDIT1:

{
  "query": {
        "match" : {
            "item" : {
                "query" : "Pineapple pizza",
                "operator": "and",
                "fuzziness": "auto"
            }
        }
    }
}

"operator": "and" --> all the tokens in query must be present in document. Default is OR , if any one token is present document is present. There are other possible combinations where you can define how many tokens should match in percent term

like image 98
jaspreet chahal Avatar answered Sep 22 '22 18:09

jaspreet chahal