Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Aggregate and filter from one index to another through a third

On my Elasticsearch server I have three indices: Person, Archive and Document.

  • Each document has a archive field which is the _id of the Archive it is in.

  • Each archive has a owner which is the _id of the Person that is the owner of the archive.

With the indices above I can aggregate documents into buckets of archives and archives into buckets of owners.

How can I also include the documents in the person aggregations so if I filter on a specific person I get the archives and their documents that belongs to the person instead of only the archives?

This is what I have so far to filter and aggregate the archives into buckets of owners:

  "post_filter": {
    "terms": {
      "owner": [
  "aggs": {
    "_filter_archive": {
      "filter": {
        "terms": {
          "owner": [
      "aggs": {
        "archive": {
          "terms": {
            "field": "archive"
like image 919
Oskar Persson Avatar asked Jan 12 '18 14:01

Oskar Persson

People also ask

What is filter aggregation?

Defines a multi bucket aggregation where each bucket is associated with a filter. Each bucket will collect all documents that match its associated filter.

What is Doc_count_error_upper_bound?

If you set the show_term_doc_count_error parameter to true , the terms aggregation will include doc_count_error_upper_bound , which is an upper bound to the error on the doc_count returned by each shard. It's the sum of the size of the largest bucket on each shard that didn't fit into shard_size .

What are Elasticsearch aggregations?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

1 Answers

This will be difficult to answer because it seems you are missing some details. The easy answer is: use nested documents or parent-child relationship. Which one to use in your case depends on a lot of factors. My suggestion is to try them both and test. See how well they perform. The third option is to denormalize your data completely. That's the reason I asked about updates, how frequent they are, how large a Person document is, how large an Archive document is etc. If you are not prepared to answer these questions, then test nested and parent-child and choose one or the other. Good luck!

like image 163
Andrei Stefan Avatar answered Oct 02 '22 15:10

Andrei Stefan