We are running ElasticSearch and are having some issues when searching for terms that contain a space. A concrete example: there is an a person named JM Bruno, but when searching for this no results are returned. I vaguely remember searching for this exact term did return the result, but I can't reproduce that right now.
I tried adding a space as well as "\ " to my tokenizer pattern, without much luck. The ES settings are the following (using the Tire gem in a Ruby on Rails application)
module Search
def self.included base
base.send :include, Tire::Model::Search
base.send :include, Tire::Model::Callbacks
base.class_eval do
settings analysis: {
filter: {
ngram: {
type: 'nGram',
max_gram: 12,
min_gram: 3
},
url_stop: {
type: "stop",
stopwords: %w[http https]
}
},
tokenizer: {
url_email_tokenizer: {
pattern: '[^\w\-\.@]+',
type: 'pattern'
}
},
analyzer: {
url_analyzer: {
tokenizer: "url_email_tokenizer",
filter: %w[url_stop ngram],
type: "custom"
},
name_analyzer: {
tokenizer: 'url_email_tokenizer',
filter: 'ngram',
type: 'custom'
}
}
}
end
end
end
We use these tokenizers to search for domain names and email addresses as well.
Try to run _analyze API with the analyzer you applied to your field.
curl -XGET 'localhost:9200/_analyze?analyzer=name_analyzer' -d 'JM Bruno'
You will see how Elasticsearch breaks your field content into tokens and why you can not search for it using a TermQuery. A TermQuery is not analyzed so it compares your query exactly as is with the inverted index.
Instead of term query this can be search by using
In Java by using advance rest client
query.must((QueryBuilders.matchQuery("name", searchMap.get("JM Bruno")).minimumShouldMatch("100%")));
in elastic search directly
GET /_search
{
"query": {
"match" : {
"name" : {
"query" : "JM Bruno",
"cutoff_frequency" : 0.001
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With