Given an index with documents that have a brand property, we need to create a term aggregation that is case insensitive.
Index definition
Please note that the use of fielddata
PUT demo_products
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"product": {
"properties": {
"brand": {
"type": "text",
"analyzer": "my_custom_analyzer",
"fielddata": true,
}
}
}
}
}
Data
POST demo_products/product
{
"brand": "New York Jets"
}
POST demo_products/product
{
"brand": "new york jets"
}
POST demo_products/product
{
"brand": "Washington Redskins"
}
Query
GET demo_products/product/_search
{
"size": 0,
"aggs": {
"brand_facet": {
"terms": {
"field": "brand"
}
}
}
}
Result
"aggregations": {
"brand_facet": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "new york jets",
"doc_count": 2
},
{
"key": "washington redskins",
"doc_count": 1
}
]
}
}
If we use keyword
instead of text
we end up the 2 buckets for New York Jets because of the differences in casing.
We're concerned about the performance implications by using fielddata. However if fielddata is disabled we get the dreaded "Fielddata is disabled on text fields by default."
Any other tips to resolve this - or should we not be so concerned about fielddate?
Starting with ES 5.2 (out today), you can use normalizers with keyword
fields in order to (e.g.) lowercase the value.
The role of normalizers is a bit like analyzers for text
fields, though what you can do with them is more restrained, but that would probably help with the issue you're facing.
You'd create the index like this:
PUT demo_products
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
"product": {
"properties": {
"brand": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
}
And your query would return this:
"aggregations" : {
"brand_facet" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "new york jets",
"doc_count" : 2
},
{
"key" : "washington redskins",
"doc_count" : 1
}
]
}
}
Best of both worlds!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With