Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch aggregation turns results to lowercase

I've been playing with ElasticSearch a little and found an issue when doing aggregations.

I have two endpoints, /A and /B. In the first one I have parents for the second one. So, one or many objects in B must belong to one object in A. Therefore, objects in B have an attribute "parentId" with parent index generated by ElasticSearch.

I want to filter parents in A by children attributes of B. In order to do it, I first filter children in B by attributes and get its unique parent ids that I'll later use to get parents.

I send this request:

POST http://localhost:9200/test/B/_search
{
    "query": {
        "query_string": {
            "default_field": "name",
            "query": "derp2*"
        }
    },
    "aggregations": {
        "ids": {
            "terms": {
                "field": "parentId"
            }
        }
    }
}

And get this response:

{
  "took": 91,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjH5u40Hx1Kh6rfQG",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child2"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjD_U40Hx1Kh6rfQF",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child1"
        }
      },
      {
        "_index": "test",
        "_type": "child",
        "_id": "AU_fjKqf40Hx1Kh6rfQH",
        "_score": 1,
        "_source": {
          "parentId": "AU_ffvwM40Hx1Kh6rfQA",
          "name": "derp2child3"
        }
      }
    ]
  },
  "aggregations": {
    "ids": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "au_ffvwm40hx1kh6rfqa",
          "doc_count": 3
        }
      ]
    }
  }
}

For some reason, the filtered key is returned in lowercase, hence not being able to request parent to ElasticSearch

GET http://localhost:9200/test/A/au_ffvwm40hx1kh6rfqa

Response:
{
  "_index": "test",
  "_type": "A",
  "_id": "au_ffvwm40hx1kh6rfqa",
  "found": false
}

Any ideas on why is this happening?

like image 372
RecuencoJones Avatar asked Sep 18 '15 10:09

RecuencoJones


People also ask

Is Elasticsearch good for aggregation?

Elasticsearch is a powerful search engine that can be used to perform aggregation on any field in a document, including nested fields. This makes it a powerful tool for data analysis and exploration.

What is Bucket aggregation in Elasticsearch?

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context "falls" into it.

How do you do aggregation in Kibana?

Choose the type of visualization you want to create, then use the editor to configure the options. On the dashboard, click All types > Aggregation based. Select the visualization type you want to create. Select the data source you want to visualize.


2 Answers

The difference between the hits and the results of the aggregations is that the aggregations work on the created terms. They will also return the terms. The hits return the original source.

How are these terms created? Based on the chosen analyser, which in your case is the default one, the standard analyser. One of the things this analyser does is lowercasing all the characters of the terms. Like mentioned by Andrei, you should configure the field parentId to be not_analyzed.

PUT test
{
  "mappings": {
    "B": {
      "properties": {
        "parentId": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }   
}
like image 153
Jettro Coenradie Avatar answered Sep 28 '22 07:09

Jettro Coenradie


I am late from the party but I had the same issue and understood that it caused by the normalization.

You have to change the mapping of the index if you want to prevent any normalization changes the aggregated values to lowercase.

You can check the current mapping in the DevTools console by typing

GET /A/_mapping
GET /B/_mapping

When you see the structure of the index you have to see the setting of the parentId field.

If you don't want to change the behaviour of the field but you also want to avoid the normalization during the aggregation then you can add a sub-field to the parentId field.

For changing the mapping you have to delete the index and recreate it with the new mapping:

  • creating the index
  • Adding multi-fields to an existing field

In your case it looks like this (it contains only the parentId field)

PUT /B/_mapping
{
  "properties": {
    "parentId": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    }
  }
}

then you have to use the subfield in the query:

POST http://localhost:9200/test/B/_search
{
  "query": {
    "query_string": {
      "default_field": "name",
      "query": "derp2*"
    }
  },
  "aggregations": {
    "ids": {
      "terms": {
        "field": "parentId.keyword",
        "order": {"_key": "desc"}
      }
    }
  }
}
like image 35
Zoltán Süle Avatar answered Sep 28 '22 07:09

Zoltán Süle