Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch match string to field with fuzziness

I'm trying to match a string to a field and only want to apply fuzziness.

For example, with these documents:

{ title: "replace oilfilter" }, { title: "replace motoroil" }

The following queries should match only the first document:

"Replace oilfilter", "Replace oilsfilter", "Replaze oilfilter"

The following queries should NOT match any document:

"replace", "oilfilter", "motoroil"

What I got so far is the following:

index

I'm using the keyword analyzer so it sees the (potential) phrase as a single word, this way it does not match a document when searching for "replace" but it does find a document when searching for the exact term "Replace oilfilter".

    "mappings": {
        "blacklist": {
            "properties": {
                "title": {
                    "type": "text",
                    "analyzer": "keyword"
                }
            }
        }
    }

search

I've tried multiple queries to search the documents. I got close with the following query:

    "query": {
        "query_string": {
            "default_field": "title",
            "fuzziness": "3",
            "query": query
        }
    }

results

With this query the following are the results:

> "Replace oilfilter" (exact words)
< doc: { title: "replace oilfilter" }, score: 0.5753..
< doc: { title: "replace motoroil" }, score: 0.2876..

> "Replace iolfilter" (typo)
< doc: { title: "replace oilfilter" }, score: 0.2876..

> "oilfilter" (other term)
< doc: { title: "replace oilfilter" }, score: 0.2876..

problem

The results aren't that bad, but I need the scores to be more accurate. The second query with only the simple typo should get a much higher score than the second result in the first query and the only result in the third query.

What I'm trying to achieve is that it matches the whole query against the whole field in the document, that's why I'm using keyword analyzer. On top of that I only want to apply some fuzziness.

Hope someone can shed some light on this issue.

Thanks!

like image 685
Tim Baas Avatar asked Dec 24 '22 12:12

Tim Baas


1 Answers

The following search should achieve what you want:

{
  "query": {
      "bool": {
        "must": {
          "multi_match": {
            "query": "replace oilfliter",
            "fuzziness": "3",
            "fields": [
              "title"
            ],
            "minimum_should_match": "75%",
            "type": "most_fields"
          }
        }
      }
  }
}

You can increase the minimum_should_match to 100% if you want require a match on all the query terms no matter how long the query string is.

like image 180
LaserJesus Avatar answered Jan 12 '23 04:01

LaserJesus