Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

analyzed v not_analyzed or ...?

New to ES so maybe a dumb question but I am trying to search using a wildcard, e.g.: "SOMECODE*" and "*SOMECODE"

It works fine, but the value in the document may have "SOMECODE/FRED".
The problem is * will match with anything (which includes nothing).
*SOMECODE will get a hit on SOMECODE/FRED.

I tried searching for */SOMECODE but this returns nothing.
I think the tokenization of the field is the root problem.
i.e., / causes the value to be 2 words.

I tried setting the map on the field to not_analyzed, but then I cant search on it at all.

Am I doing it wrong?

Thanks

like image 692
Jonesie Avatar asked Jan 30 '13 02:01

Jonesie


1 Answers

By setting not_analyzed, you are only allowing exact matches (e.g. "SOMECODE/FRED" only, including case and special characters).

My guess is that you are using the standard analyzer (It is the default analyzer if you don't specify one). If that's the case, Standard will treat slashes as a token separator, and generate two tokens [somecode] and [fred]:

$ curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'SOMECODE/FRED'
{
    "tokens" : [ {
    "token" : "somecode",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "fred",
    "start_offset" : 9,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

If you don't want this behavior, you need to change to a tokenizer that doesn't split on special characters. However, I would question the use-case for this. Generally, you'll want to split those types of characters.

like image 129
Zach Avatar answered Oct 18 '22 17:10

Zach