Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate and filter from one index to another through a third

On my Elasticsearch server I have three indices: Person, Archive and Document.

  • Each document has a archive field which is the _id of the Archive it is in.

  • Each archive has a owner which is the _id of the Person that is the owner of the archive.

With the indices above I can aggregate documents into buckets of archives and archives into buckets of owners.

How can I also include the documents in the person aggregations so if I filter on a specific person I get the archives and their documents that belongs to the person instead of only the archives?


This is what I have so far to filter and aggregate the archives into buckets of owners:

{
  "post_filter": {
    "terms": {
      "owner": [
        "my_owner_id"
      ]
    }
  },
  "aggs": {
    "_filter_archive": {
      "filter": {
        "terms": {
          "owner": [
            "my_owner_id"
          ]
        }
      },
      "aggs": {
        "archive": {
          "terms": {
            "field": "archive"
          }
        }
      }
    }
  }
}
like image 919
Oskar Persson Avatar asked Jan 12 '18 14:01

Oskar Persson


People also ask

What is filter aggregation?

Defines a multi bucket aggregation where each bucket is associated with a filter. Each bucket will collect all documents that match its associated filter.

What is Doc_count_error_upper_bound?

If you set the show_term_doc_count_error parameter to true , the terms aggregation will include doc_count_error_upper_bound , which is an upper bound to the error on the doc_count returned by each shard. It's the sum of the size of the largest bucket on each shard that didn't fit into shard_size .

What are Elasticsearch aggregations?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.


1 Answers

This will be difficult to answer because it seems you are missing some details. The easy answer is: use nested documents or parent-child relationship. Which one to use in your case depends on a lot of factors. My suggestion is to try them both and test. See how well they perform. The third option is to denormalize your data completely. That's the reason I asked about updates, how frequent they are, how large a Person document is, how large an Archive document is etc. If you are not prepared to answer these questions, then test nested and parent-child and choose one or the other. Good luck!

like image 163
Andrei Stefan Avatar answered Oct 02 '22 15:10

Andrei Stefan