Elasticsearch aggregation turns results to lowercase

Tags:

I've been playing with ElasticSearch a little and found an issue when doing aggregations.

I have two endpoints, /A and /B. In the first one I have parents for the second one. So, one or many objects in B must belong to one object in A. Therefore, objects in B have an attribute "parentId" with parent index generated by ElasticSearch.

I want to filter parents in A by children attributes of B. In order to do it, I first filter children in B by attributes and get its unique parent ids that I'll later use to get parents.

I send this request:

POST http://localhost:9200/test/B/_search
{
    "query": {
        "query_string": {
            "default_field": "name",
            "query": "derp2*"
        }
    },
    "aggregations": {
        "ids": {
            "terms": {
                "field": "parentId"
            }
        }
    }
}

And get this response:

{
  "took": 91,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjH5u40Hx1Kh6rfQG",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child2"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjD_U40Hx1Kh6rfQF",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child1"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjKqf40Hx1Kh6rfQH",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child3"
        }
      }
    ]
  },
  "aggregations": {
    "ids": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "au_ffvwm40hx1kh6rfqa",
          "doc_count": 3
        }
      ]
    }
  }
}

For some reason, the filtered key is returned in lowercase, hence not being able to request parent to ElasticSearch

GET http://localhost:9200/test/A/au_ffvwm40hx1kh6rfqa

Response:
{
  "_index": "test",
  "_type": "A",
  "_id": "au_ffvwm40hx1kh6rfqa",
  "found": false
}

Any ideas on why is this happening?

372

asked Sep 18 '15 10:09

RecuencoJones

2 Answers

The difference between the hits and the results of the aggregations is that the aggregations work on the created terms. They will also return the terms. The hits return the original source.

How are these terms created? Based on the chosen analyser, which in your case is the default one, the standard analyser. One of the things this analyser does is lowercasing all the characters of the terms. Like mentioned by Andrei, you should configure the field parentId to be not_analyzed.

PUT test
{
  "mappings": {
    "B": {
      "properties": {
        "parentId": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }   
}

153

answered Sep 28 '22 07:09

Jettro Coenradie

I am late from the party but I had the same issue and understood that it caused by the normalization.

You have to change the mapping of the index if you want to prevent any normalization changes the aggregated values to lowercase.

You can check the current mapping in the DevTools console by typing

GET /A/_mapping
GET /B/_mapping

When you see the structure of the index you have to see the setting of the parentId field.

If you don't want to change the behaviour of the field but you also want to avoid the normalization during the aggregation then you can add a sub-field to the parentId field.

For changing the mapping you have to delete the index and recreate it with the new mapping:

creating the index
Adding multi-fields to an existing field

In your case it looks like this (it contains only the parentId field)

PUT /B/_mapping
{
  "properties": {
    "parentId": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    }
  }
}

then you have to use the subfield in the query:

POST http://localhost:9200/test/B/_search
{
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "derp2*"
    }
  },
  "aggregations": {
    "ids": {
      "terms": {
        "field": "parentId.keyword",
        "order": {"_key": "desc"}
      }
    }
  }
}

answered Sep 28 '22 07:09

Zoltán Süle

Related questions
                            
                                The tag "beats_input_codec_plain_applied" present in every document in Kibana
                            
                                Elastic Search Analyzers and Facets
                            
                                Indexing documents in ElasticSearch with PHP curl
                            
                                Allowing remote access to Elasticsearch
                            
                                Getting count and grouping by date range in elastic search
                            
                                What is the length and characters in ids generated by elasticsearch?
                            
                                Setting Elasticsearch Analyzer for new fields in logstash
                            
                                ElasticSearch:filtering documents based on field length
                            
                                Elasticsearch data migration
                            
                                Difference between Elasticsearch Range Query and Range Filter
                            
                                Add type mapping with JSON schema and ElasticSearch Java API
                            
                                Elasticsearch distinct filter values
                            
                                ElasticSearch NEST: Create an index through ElasticClient by specifying json
                            
                                Elasticsearch converting a string to number
                            
                                ElasticSearch - Reindexing your data with zero downtime
                            
                                How to create a query with query_string using Elasticsearch Java Api
                            
                                Does the elasticsearch ID have to be unique to a type or to the index?
                            
                                elasticsearch create or update document using python
                            
                                Trying to use Logstash to index FROM Cloudwatch Logs
                            
                                ElasticSearch: Querying a field that's an array of objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Elasticsearch aggregation turns results to lowercase

Tags:

lowercase

elasticsearch

analyzer

elasticsearch-aggregation

RecuencoJones

People also ask

2 Answers

Jettro Coenradie

Zoltán Süle

Recent Activity

Donate For Us