I have a multi_match
query of type cross_fields
, which I want to improve with prefix matching.
{
"index": "companies",
"size": 25,
"from": 0,
"body": {
"_source": {
"include": [
"name",
"address"
]
},
"query": {
"filtered": {
"query": {
"multi_match": {
"type": "cross_fields",
"query": "Google",
"operator": "and",
"fields": [
"name",
"address"
]
}
}
}
}
}
}
It is matching perfectly on queries such as google mountain view
. The filtered
array is there because I dynamically need to add geo filters.
{
"id": 1,
"name": "Google",
"address": "Mountain View"
}
Now I want to allow prefix matching, without breaking cross_fields
.
Queries such as these should match:
goog
google mount
google mountain vi
mountain view goo
If I change the multi_match.type
to phrase_prefix
, it matches the whole query against a single field, so it matches only against mountain vi
but not against google mountain vi
How do I solve this?
Match phrase prefix queryedit. Returns documents that contain the words of a provided text, in the same order as provided. The last term of the provided text is treated as a prefix, matching any words that begin with that term.
Match phrase queryedit A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.
Minimum Should Match is another search technique that allows you to conduct a more controlled search on related or co-occurring topics by specifying the number of search terms or phrases in the query that should occur within the records returned.
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses: Leaf query clauses.
As there are no answers and someone might see this, I had the same problem and here is a solution:
Using the edgeNGrams tokenizer.
You need to change the index settings and the mappings.
Here's an example for the settings:
"settings" : {
"index" : {
"analysis" : {
"analyzer" : {
"ngram_analyzer" : {
"type" : "custom",
"stopwords" : "_none_",
"filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop", "ngram_filter" ],
"tokenizer" : "standard"
},
"default" : {
"type" : "custom",
"stopwords" : "_none_",
"filter" : [ "standard", "lowercase", "asciifolding", "word_delimiter", "no_stop" ],
"tokenizer" : "standard"
}
},
"filter" : {
"no_stop" : {
"type" : "stop",
"stopwords" : "_none_"
},
"ngram_filter" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "20"
}
}
}
}
}
Of course, you should adapt the analyzers for your own use case. You might want to leave the default analyzer untouched or add the ngram filter to it so you don't have to change the mappings. That last solution would mean that all fields in your index will get the ngram filter.
And for the mapping:
"mappings" : {
"patient" : {
"properties" : {
"name" : {
"type" : "string",
"analyzer" : "ngram_analyzer"
},
"address" : {
"type" : "string",
"analyzer" : "ngram_analyzer"
}
}
}
}
Declare every field you want to autocomplete with the ngram_analyzer. Then the queries in your question should work. If you used something else, I'd be happy to hear about it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With