I have a JSON object like below in Elastic.
{
"_source" : {
"version" : 1,
"object_id" : "f1dcae27-7a6f-4fea-b540-901c09b60a15",
"object_name" : "testFileName_for_TestSweepAndPrune",
"object_type" : "",
"object_status" : "OBJ_DELETED",
"u_attributes" : ""
}
}
My term query like this doesn't work.
{
"query": {
"term": {
"object_status": "OBJ_DELETED"
}
},
"size": 10000
}
Wile match query works fine with same conditions.
{
"query": {
"match": {
"object_status": "OBJ_DELETED"
}
},
"size": 10000
}
Wondering what could be happening here? How can I make the term query work here for this condition?
To understand why term
query is not working as you expect it we need to check how ElasticSearch
process and saves data and how match
and term
queries are different.
Normally when you save some text into ElasticSearch
it is analyzed first and then saved. Analysis is done by analyzer. There are many analyzers, but if you don't specify any then default one will be used. Analyzer processes text, converts it into array of tokens and saves the list of tokens. The rules how text is splitted into tokens are different for each particular analyzer.
When text is processed and saved you can query it. There are many ways to query something, but in your case the main difference between match
and term
is that match
is full text query and term
is term level query. The thing is that in case of full text search your query string is analyzed in the same way as the field you are querying was analyzed. In term level queries query string is not analyzed. It's important to note.
Now let's see how "OBJ_DELETED"
is analyzed by ElasticSearch
. For that we can add simple document like this:
curl -X PUT 'localhost:9200/testdata/object/1' -H 'Content-Type: application/json' -d '{ "object_status": "OBJ_DELETED" }'
Then check that everything is there:
curl -X POST 'localhost:9200/testdata/_search?pretty'
should produce something like this:
...
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "testdata",
"_type" : "object",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"object_status" : "OBJ_DELETED"
}
}
]
}
Now we can check how "OBJ_DELETED"
is analyzed:
curl -X POST 'localhost:9200/testdata/_analyze?pretty' -H 'Content-Type: application/json' -d '{ "text": "OBJ_DELETED" }'
and it outputs:
{
"tokens" : [
{
"token" : "obj_deleted",
"start_offset" : 0,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
As you can see it only converted text into lowercase and saved it as one token. This is how default analyzer does it. Now returning to your queries. match
query works because query value "OBJ_DELETED"
is also converted to lowercase under the hood and thus ElasticSearch
can find it. And for term
query the query string is not processed so actually you are comparing OBJ_DELETED
with obj_deleted
and obviously you get no results.
And last question: why object_status.keyword
works for term
query?
By default ElasticSearch
create additional mapping for each text field. It's kind of metadata that you can use. Also it allows you to process the same value in different ways. So by default each text field has additional mapping with name keyword
which has type keyword. keyword
fields are not analyzed (they only can be normalyzed if needed). It means that for default mapping it saves the exact value that you passes to ElasticSearch
(OBJ_DELETED
in your case).
You should avoid using the term
query for text
fields (see term query notes in guidelines). By default, Elasticsearch changes the values of text fields during analysis. For example, the default standard analyzer changes text field values as follows:
You can use keyword analyzer to produce the correct and searchable term
from appropriate field in your index. Elasticsearch offers a variety of ways to specify analyzers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With