I have a <code>multi_match</code> query of type <code>cross_fields</code>, which I want to improve with prefix matching. <pre class="prettyprint"><code>{ "index": "companies", "size": 25, "from": 0, "body": { "_source": { "include": [ "name", "address" ] }, "query": { "filtered": { "query": { "multi_match": { "type": "cross_fields", "query": "Google", "operator": "and", "fields": [ "name", "address" ] } } } } } } </code></pre> It is matching perfectly on queries such as <code>google mountain view</code>. The <code>filtered</code> array is there because I dynamically need to add geo filters. <pre class="prettyprint"><code>{ "id": 1, "name": "Google", "address": "Mountain View" } </code></pre> Now I want to allow prefix matching, without breaking <code>cross_fields</code>. Queries such as these should match: <ul> <li><code>goog</code></li> <li><code>google mount</code></li> <li><code>google mountain vi</code></li> <li><code>mountain view goo</code></li> </ul> If I change the <code>multi_match.type</code> to <code>phrase_prefix</code>, it matches the whole query against a single field, so it matches only against <code>mountain vi</code> but not against <code>google mountain vi</code> How do I solve this?

As there are no answers and someone might see this, I had the same problem and here is a solution: Using the edgeNGrams tokenizer. You need to change the index settings and the mappings. Here's an example for the settings: <pre class="prettyprint"><code>"settings" : { "index" : { "analysis" : { "analyzer" : { "ngram_analyzer" : { "type" : "custom", "stopwords" : "_none_", "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop", "ngram_filter" ], "tokenizer" : "standard" }, "default" : { "type" : "custom", "stopwords" : "_none_", "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop" ], "tokenizer" : "standard" } }, "filter" : { "no_stop" : { "type" : "stop", "stopwords" : "_none_" }, "ngram_filter" : { "type" : "edgeNGram", "min_gram" : "2", "max_gram" : "20" } } } } } </code></pre> Of course, you should adapt the analyzers for your own use case. You might want to leave the default analyzer untouched or add the ngram filter to it so you don't have to change the mappings. That last solution would mean that all fields in your index will get the ngram filter. And for the mapping: <pre class="prettyprint"><code>"mappings" : { "patient" : { "properties" : { "name" : { "type" : "string", "analyzer" : "ngram_analyzer" }, "address" : { "type" : "string", "analyzer" : "ngram_analyzer" } } } } </code></pre> Declare every field you want to autocomplete with the ngram_analyzer. Then the queries in your question should work. If you used something else, I'd be happy to hear about it.

Elastic Search multi_match cross_fields prefix

Tags:

elasticsearch

prefix

I have a multi_match query of type cross_fields, which I want to improve with prefix matching.

{
  "index": "companies",
  "size": 25,
  "from": 0,
  "body": {
    "_source": {
      "include": [
        "name",
        "address"
      ]
    },
    "query": {
      "filtered": {
        "query": {
          "multi_match": {
            "type": "cross_fields",
            "query": "Google",
            "operator": "and",
            "fields": [
              "name",
              "address"
            ]
          }
        }
      }
    }
  }
}

It is matching perfectly on queries such as google mountain view. The filtered array is there because I dynamically need to add geo filters.

{
  "id": 1,
  "name": "Google",
  "address": "Mountain View"
}

Now I want to allow prefix matching, without breaking cross_fields.

Queries such as these should match:

goog
google mount
google mountain vi
mountain view goo

If I change the multi_match.type to phrase_prefix, it matches the whole query against a single field, so it matches only against mountain vi but not against google mountain vi

How do I solve this?

477

asked Feb 21 '15 22:02

Bouke Versteegh

1 Answers

As there are no answers and someone might see this, I had the same problem and here is a solution:

Using the edgeNGrams tokenizer.

You need to change the index settings and the mappings.

Here's an example for the settings:

"settings" : {
  "index" : {
    "analysis" : {
      "analyzer" : {
        "ngram_analyzer" : {
          "type" : "custom",
          "stopwords" : "_none_",
          "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop", "ngram_filter" ],
          "tokenizer" : "standard"
        },
        "default" : {
          "type" : "custom",
          "stopwords" : "_none_",
          "filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop" ],
          "tokenizer" : "standard"
        }
      },
      "filter" : {
        "no_stop" : {
          "type" : "stop",
          "stopwords" : "_none_"
        },
        "ngram_filter" : {
          "type" : "edgeNGram",
          "min_gram" : "2",
          "max_gram" : "20"
        }
      }
    }
  }
}

Of course, you should adapt the analyzers for your own use case. You might want to leave the default analyzer untouched or add the ngram filter to it so you don't have to change the mappings. That last solution would mean that all fields in your index will get the ngram filter.

And for the mapping:

"mappings" : {
  "patient" : {
    "properties" : {
      "name" : {
        "type" : "string",
        "analyzer" : "ngram_analyzer"
      },
      "address" : {
        "type" : "string",
        "analyzer" : "ngram_analyzer"
      }
    }
  }
}

Declare every field you want to autocomplete with the ngram_analyzer. Then the queries in your question should work. If you used something else, I'd be happy to hear about it.

161

answered Oct 20 '22 11:10

Yannick Fonjallaz

Related questions
                            
                                Elasticsearch find all indexes using the Java client
                            
                                How to Log to Elastic Search by NLog or SeriLog with authentications
                            
                                Make logstash add different inputs to different indices
                            
                                Disable IDF calculation
                            
                                How to print out the inverted index created by elasticsearch?
                            
                                How to return all documents for each bucket after ElasticSearch term aggregation?
                            
                                Elasticsearch when a file system goes read-only
                            
                                Elasticsearch - what to do if fields have the same name but multiple mapping
                            
                                mongoDB vs. elasticsearch query/aggregation performance comparison
                            
                                Elasticsearch / Kibana: Application-side joins
                            
                                How do I create a scripted field in kibana 4 that uses aggregation?
                            
                                How to mock an Elasticsearch Java Client?
                            
                                Elasticsearch completion - generating input list with analyzers
                            
                                Elastic search sort script not working after upgrade
                            
                                Fuzzy search for group of word / multiple terms
                            
                                Solandra vs. ElasticSearch
                            
                                Not able to do proper path monitoring or querying with Firebase and ElasticSearch (with Flashlight)
                            
                                Index the results of a method in ElasticSearch (Tire + ActiveRecord)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With