Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Search multi_match cross_fields prefix

I have a multi_match query of type cross_fields, which I want to improve with prefix matching.

{
  "index": "companies",
  "size": 25,
  "from": 0,
  "body": {
    "_source": {
      "include": [
        "name",
        "address"
      ]
    },
    "query": {
      "filtered": {
        "query": {
          "multi_match": {
            "type": "cross_fields",
            "query": "Google",
            "operator": "and",
            "fields": [
              "name",
              "address"
            ]
          }
        }
      }
    }
  }
}

It is matching perfectly on queries such as google mountain view. The filtered array is there because I dynamically need to add geo filters.

{
  "id": 1,
  "name": "Google",
  "address": "Mountain View"
} 

Now I want to allow prefix matching, without breaking cross_fields.

Queries such as these should match:

  • goog
  • google mount
  • google mountain vi
  • mountain view goo

If I change the multi_match.type to phrase_prefix, it matches the whole query against a single field, so it matches only against mountain vi but not against google mountain vi

How do I solve this?

like image 477
Bouke Versteegh Avatar asked Feb 21 '15 22:02

Bouke Versteegh


People also ask

What is phrase prefix in Elasticsearch?

Match phrase prefix queryedit. Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.

What is match phrase in Elasticsearch?

Match phrase queryedit A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.

Should minimum should match?

Minimum Should Match is another search technique that allows you to conduct a more controlled search on related or co-occurring topics by specifying the number of search terms or phrases in the query that should occur within the records returned.

What is query DSL in elastic search?

Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses: Leaf query clauses.


1 Answers

As there are no answers and someone might see this, I had the same problem and here is a solution:

Using the edgeNGrams tokenizer.

You need to change the index settings and the mappings.

Here's an example for the settings:

"settings" : {
  "index" : {
    "analysis" : {
      "analyzer" : {
        "ngram_analyzer" : {
          "type" : "custom",
          "stopwords" : "_none_",
          "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop", "ngram_filter" ],
          "tokenizer" : "standard"
        },
        "default" : {
          "type" : "custom",
          "stopwords" : "_none_",
          "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop" ],
          "tokenizer" : "standard"
        }
      },
      "filter" : {
        "no_stop" : {
          "type" : "stop",
          "stopwords" : "_none_"
        },
        "ngram_filter" : {
          "type" : "edgeNGram",
          "min_gram" : "2",
          "max_gram" : "20"
        }
      }
    }
  }
}

Of course, you should adapt the analyzers for your own use case. You might want to leave the default analyzer untouched or add the ngram filter to it so you don't have to change the mappings. That last solution would mean that all fields in your index will get the ngram filter.

And for the mapping:

"mappings" : {
  "patient" : {
    "properties" : {
      "name" : {
        "type" : "string",
        "analyzer" : "ngram_analyzer"
      },
      "address" : {
        "type" : "string",
        "analyzer" : "ngram_analyzer"
      }
    }
  }
}

Declare every field you want to autocomplete with the ngram_analyzer. Then the queries in your question should work. If you used something else, I'd be happy to hear about it.

like image 161
Yannick Fonjallaz Avatar answered Oct 20 '22 11:10

Yannick Fonjallaz