Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch multi_match query over multiple fields with Fuzziness

How can I add fuzziness to a multi_match query? So if someone is to search for 'basball' it would still find 'baseball' articles. Currently my query looks like this:

POST /newspaper/articles/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": "baseball",
                    "type": "phrase",
                    "fields": [
                        "subject^3", 
                        "section^2.5", 
                        "article^2", 
                        "tags^1.5",
                        "notes^1"
                    ]
                }
            }
        }
    }
}

One option I was looking at is to do something like this, just don't know if this is the best option. It's important to keep the sorting based on the scoring:

   "query" : { 
      "query_string" : { 
         "query" : "subject:basball^3 section:basball^2.5 article:basball^2", 
         "fuzzy_prefix_length" : 1 
      } 
   } 

Suggestions?

like image 1000
Funtriaco Prado Avatar asked Apr 14 '15 16:04

Funtriaco Prado


People also ask

Does Elasticsearch do fuzzy matching?

In Elasticsearch, fuzzy query means the terms are not the exact matches of the index. The result is 2, but you can use fuzziness to find the correct word for a typo in Elasticsearch's fuzzy in Match Query. For 6 characters, the Elasticsearch by default will allow 2 edit distance.

How do I search multiple fields in Elasticsearch?

One of the most common queries in elasticsearch is the match query, which works on a single field. And there's another query with the very same options that works also on multiple fields, called multi_match. These queries support text analysis and work really well.

What is fuzzy query in Elasticsearch?

Fuzzy queryedit. Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance. An edit distance is the number of one-character changes needed to turn one term into another.

What is multi match query?

The multi_match query builds on the match query to allow multi-field queries: GET /_search { "query": { "multi_match" : { "query": "this is a test", "fields": [ "subject", "message" ] } } } The query string. The fields to be queried.


1 Answers

To add fuzziness to a multiquery you need to add the fuzziness property as described here:

{
    "query": {
        "function_score": {
            "query": {
                "multi_match": {
                    "query": "baseball",
                    "type": "phrase",
                    "fields": [
                        "subject^3", 
                        "section^2.5", 
                        "article^2", 
                        "tags^1.5",
                        "notes^1"
                    ],
                    "fuzziness" : "AUTO",
                    "prefix_length" : 2

                }
            }
        }
    }
}

Please notice that prefix_length explained in the doc as:

The number of initial characters which will not be “fuzzified”. This helps to reduce the number of terms which must be examined. Defaults to 0.

To check the possible values of fuzziness please visit the ES docs.

like image 80
nan-ead Avatar answered Sep 20 '22 15:09

nan-ead