New to ES so maybe a dumb question but I am trying to search using a wildcard, e.g.: "SOMECODE*"
and "*SOMECODE"
It works fine, but the value in the document may have "SOMECODE/FRED"
.
The problem is *
will match with anything (which includes nothing).*SOMECODE
will get a hit on SOMECODE/FRED
.
I tried searching for */SOMECODE
but this returns nothing.
I think the tokenization of the field is the root problem.
i.e., /
causes the value to be 2 words.
I tried setting the map on the field to not_analyzed
, but then I cant search on it at all.
Am I doing it wrong?
Thanks
By setting not_analyzed
, you are only allowing exact matches (e.g. "SOMECODE/FRED"
only, including case and special characters).
My guess is that you are using the standard analyzer (It is the default analyzer if you don't specify one). If that's the case, Standard will treat slashes as a token separator, and generate two tokens [somecode]
and [fred]
:
$ curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'SOMECODE/FRED'
{
"tokens" : [ {
"token" : "somecode",
"start_offset" : 0,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "fred",
"start_offset" : 9,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 2
} ]
}
If you don't want this behavior, you need to change to a tokenizer that doesn't split on special characters. However, I would question the use-case for this. Generally, you'll want to split those types of characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With