analyzed v not_analyzed or ...?

Question

New to ES so maybe a dumb question but I am trying to search using a wildcard, e.g.: "SOMECODE*" and "*SOMECODE"

It works fine, but the value in the document may have "SOMECODE/FRED".
The problem is * will match with anything (which includes nothing).
*SOMECODE will get a hit on SOMECODE/FRED.

I tried searching for */SOMECODE but this returns nothing.
I think the tokenization of the field is the root problem.
i.e., / causes the value to be 2 words.

I tried setting the map on the field to not_analyzed, but then I cant search on it at all.

Am I doing it wrong?

Thanks

Zach · Accepted Answer

By setting not_analyzed, you are only allowing exact matches (e.g. "SOMECODE/FRED" only, including case and special characters).

My guess is that you are using the standard analyzer (It is the default analyzer if you don't specify one). If that's the case, Standard will treat slashes as a token separator, and generate two tokens [somecode] and [fred]:

$ curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'SOMECODE/FRED'
{
    "tokens" : [ {
    "token" : "somecode",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "fred",
    "start_offset" : 9,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

If you don't want this behavior, you need to change to a tokenizer that doesn't split on special characters. However, I would question the use-case for this. Generally, you'll want to split those types of characters.

analyzed v not_analyzed or ...?

Tags:

elasticsearch

wildcard

Jonesie

1 Answers

Zach

Recent Activity

Donate For Us

analyzed v not_analyzed or ...?

Tags:

elasticsearch

wildcard

Jonesie

1 Answers

Zach

Related questions

Recent Activity

Donate For Us