I am working on an ElasticSearch (6.2) project where the index
has many keyword
fields and they are normalized with lowercase
filter for performing case-insensitive searches. The search working great and returning actual values (not lowercase) of the normalized fields. However, the aggregations not returning the actual value (returning lowercase) of the fields.
The following example has been taken from ElasticSearch doc.
https://www.elastic.co/guide/en/elasticsearch/reference/master/normalizer.html
Creating index:
PUT index { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["lowercase", "asciifolding"] } } } }, "mappings": { "_doc": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } }
Inserting a doc:
PUT index/_doc/1 { "foo": "Bar" } PUT index/_doc/2 { "foo": "Baz" }
Search with aggregation:
GET index/_search { "size": 0, "aggs": { "foo_terms": { "terms": { "field": "foo" } } } }
Result:
{ "took": 43, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total": 3, "max_score": 0.0, "hits": { "total": 2, "max_score": 0.47000363, "hits": [ { "_index": "index", "_type": "_doc", "_id": "1", "_score": 0.47000363, "_source": { "foo": "Bar" } }, { "_index": "index", "_type": "_doc", "_id": "2", "_score": 0.47000363, "_source": { "foo": "Baz" } } ] } }, "aggregations": { "foo_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "bar", "doc_count": 1 }, { "key": "baz", "doc_count": 1 } ] } } }
If you check the aggregation, you will see that lowercase value has been returned. e.g. "key": "bar"
.
Is there any way to change the aggregation to return actual value?
e.g. "key": "Bar"
If you want to do case-insensitive search yet return exact values in your aggregations you don't need any normalizer. You can simply have a text
field (which lowercases the tokens and allows case-insensitive search by default) with a keyword
sub-field. You'd use the former for search and the latter for aggregations. It goes like this:
PUT index
{
"mappings": {
"_doc": {
"properties": {
"foo": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
After indexing your two documents, your can issue a terms
aggregation on foo.keyword
:
GET index/_search
{
"size": 2,
"aggs": {
"foo_terms": {
"terms": {
"field": "foo.keyword"
}
}
}
}
And the result would look like this:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "index",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"foo": "Baz"
}
},
{
"_index": "index",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"foo": "Bar"
}
}
]
},
"aggregations": {
"foo_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Bar",
"doc_count": 1
},
{
"key": "Baz",
"doc_count": 1
}
]
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With