I have the following mapping for an aggregation field:
"language" : {
"type" : "string",
"index": "analyzed",
"analyzer" : "standard"
}
The value of a sample document in this property may look like: "en zh_CN"
This property has no other use except aggregation. I notice that when I get aggregation results on this property:
{
"query": {
"filtered" : {
"query": {
"match_all": {}
},
"filter" : {
...
}
}
},
"aggregations": {
"facets": {
"terms": {
"field": "language"
}
}
}
}
The bucket key values are in lower case.
"aggregations" : {
"facets" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "zh_cn",
"doc_count" : 2
}, {
"key" : "en",
"doc_count" : 1
} ]
}
}
How can I achieve my aggregation goal without letting ES to lowers the case of its values. I feel that I may need to change the mapping for this property, but not sure how.
Thanks and regards.
Try this in your mapping instead:
"language" : {
"type" : "string",
"index": "not_analyzed"
}
The text in that field of each document will be used, unmodified, to create tokens, and those tokens will be returned by your terms aggregation. For the example value you provided, the aggregation will return it verbatim:
"aggregations": {
"facets": {
"buckets": [
{
"key": "en zh_CN",
"doc_count": 1
}
]
}
}
If you still want the text to be tokenized on whitespace, you can try using the whitespace analyzer in your mapping:
"language": {
"type": "string",
"analyzer": "whitespace"
}
Then your aggregation will return:
"aggregations": {
"facets": {
"buckets": [
{
"key": "en",
"doc_count": 1
},
{
"key": "zh_CN",
"doc_count": 1
}
]
}
}
Here is the code I used to test both examples:
http://sense.qbox.io/gist/a7b3c7d50c7012537c50d576d03940b28b5f8793
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With