Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple term query not working with elastic while match works

I have a JSON object like below in Elastic.

{
    "_source" : {
      "version" : 1,
      "object_id" : "f1dcae27-7a6f-4fea-b540-901c09b60a15",
      "object_name" : "testFileName_for_TestSweepAndPrune",
      "object_type" : "",
      "object_status" : "OBJ_DELETED",
      "u_attributes" : ""
    }

}

My term query like this doesn't work.

{
            "query": {
                "term": {
                    "object_status": "OBJ_DELETED"
                }
            },
            "size": 10000

}

Wile match query works fine with same conditions.

{
            "query": {
                "match": {
                    "object_status": "OBJ_DELETED"
                }
            },
            "size": 10000

}

Wondering what could be happening here? How can I make the term query work here for this condition?

like image 402
joe Avatar asked Sep 19 '18 18:09

joe


2 Answers

To understand why term query is not working as you expect it we need to check how ElasticSearch process and saves data and how match and term queries are different.

Normally when you save some text into ElasticSearch it is analyzed first and then saved. Analysis is done by analyzer. There are many analyzers, but if you don't specify any then default one will be used. Analyzer processes text, converts it into array of tokens and saves the list of tokens. The rules how text is splitted into tokens are different for each particular analyzer.

When text is processed and saved you can query it. There are many ways to query something, but in your case the main difference between match and term is that match is full text query and term is term level query. The thing is that in case of full text search your query string is analyzed in the same way as the field you are querying was analyzed. In term level queries query string is not analyzed. It's important to note.

Now let's see how "OBJ_DELETED" is analyzed by ElasticSearch. For that we can add simple document like this:

curl -X PUT 'localhost:9200/testdata/object/1' -H 'Content-Type: application/json' -d '{ "object_status": "OBJ_DELETED"  }'

Then check that everything is there:

curl -X POST 'localhost:9200/testdata/_search?pretty'

should produce something like this:

...
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
  {
    "_index" : "testdata",
    "_type" : "object",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
      "object_status" : "OBJ_DELETED"
    }
  }
]

}

Now we can check how "OBJ_DELETED" is analyzed:

curl -X POST 'localhost:9200/testdata/_analyze?pretty' -H 'Content-Type: application/json' -d '{ "text": "OBJ_DELETED"  }'

and it outputs:

{
  "tokens" : [
    {
      "token" : "obj_deleted",
      "start_offset" : 0,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

As you can see it only converted text into lowercase and saved it as one token. This is how default analyzer does it. Now returning to your queries. match query works because query value "OBJ_DELETED" is also converted to lowercase under the hood and thus ElasticSearch can find it. And for term query the query string is not processed so actually you are comparing OBJ_DELETED with obj_deleted and obviously you get no results.

And last question: why object_status.keyword works for term query?

By default ElasticSearch create additional mapping for each text field. It's kind of metadata that you can use. Also it allows you to process the same value in different ways. So by default each text field has additional mapping with name keyword which has type keyword. keyword fields are not analyzed (they only can be normalyzed if needed). It means that for default mapping it saves the exact value that you passes to ElasticSearch (OBJ_DELETED in your case).

like image 78
rkm Avatar answered Oct 20 '22 04:10

rkm


You should avoid using the term query for text fields (see term query notes in guidelines). By default, Elasticsearch changes the values of text fields during analysis. For example, the default standard analyzer changes text field values as follows:

  • Removes most punctuation
  • Divides the remaining content into individual words, called tokens
  • Lowercases the tokens

You can use keyword analyzer to produce the correct and searchable term from appropriate field in your index. Elasticsearch offers a variety of ways to specify analyzers.

like image 2
lu_ko Avatar answered Oct 20 '22 05:10

lu_ko