Elasticsearch: how to make an aggregation field not change the case of values

Question

I have the following mapping for an aggregation field:

"language" : {
    "type" : "string",
    "index": "analyzed",
    "analyzer" : "standard"
}

The value of a sample document in this property may look like: "en zh_CN"

This property has no other use except aggregation. I notice that when I get aggregation results on this property:

{
  "query": {
        "filtered" : {
            "query": { 
                    "match_all": {}
            },
            "filter" : {
                 ...
            }
        }
    },
    "aggregations": {
        "facets": {
            "terms": {
                "field": "language"
            }
        }
    }   
}

The bucket key values are in lower case.

  "aggregations" : {
    "facets" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "zh_cn",
        "doc_count" : 2
      }, {
        "key" : "en",
        "doc_count" : 1
      } ]
    }
  }

How can I achieve my aggregation goal without letting ES to lowers the case of its values. I feel that I may need to change the mapping for this property, but not sure how.

Thanks and regards.

Sloan Ahrens · Accepted Answer

Try this in your mapping instead:

"language" : {
    "type" : "string",
    "index": "not_analyzed"
}

The text in that field of each document will be used, unmodified, to create tokens, and those tokens will be returned by your terms aggregation. For the example value you provided, the aggregation will return it verbatim:

"aggregations": {
   "facets": {
      "buckets": [
         {
            "key": "en zh_CN",
            "doc_count": 1
         }
      ]
   }
}

If you still want the text to be tokenized on whitespace, you can try using the whitespace analyzer in your mapping:

"language": {
   "type": "string",
   "analyzer": "whitespace"
}

Then your aggregation will return:

"aggregations": {
   "facets": {
      "buckets": [
         {
            "key": "en",
            "doc_count": 1
         },
         {
            "key": "zh_CN",
            "doc_count": 1
         }
      ]
   }
}

Here is the code I used to test both examples:

http://sense.qbox.io/gist/a7b3c7d50c7012537c50d576d03940b28b5f8793

Elasticsearch: how to make an aggregation field not change the case of values

Tags:

elasticsearch

curious1

1 Answers

Sloan Ahrens

Recent Activity

Donate For Us

Elasticsearch: how to make an aggregation field not change the case of values

Tags:

elasticsearch

curious1

1 Answers

Sloan Ahrens

Related questions

Recent Activity

Donate For Us