Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit ElasticSearch aggregation to top n query results

I have a set of 2.8 million docs with sets of tags that I'm querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data using the tags, and then aggregate them by the ID that repeats. Often my search results have tens of thousands of documents, but I only want to aggregate the top 100 results of the search. How can I constrain an aggregation to only the top 100 results from a query?

like image 823
Patrick Pan Avatar asked Mar 06 '15 09:03

Patrick Pan


People also ask

What is top hit aggregation?

A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is precision threshold in Elasticsearch?

Elasticsearch makes this threshold configurable through the precision_threshold parameter. For example, if you configure a precision_threshold of 1000 , you could expect precision to be excellent if the return value is < 1000 and a bit more approximate otherwise.


1 Answers

Sampler Aggregation :

A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.

"aggs": {
     "bestDocs": {
         "sampler": {
          //    "field": "<FIELD>", <-- optional, Controls diversity using a field
              "shard_size":100
         },
         "aggs": {
              "bestBuckets": {
                 "terms": {
                      "field": "id"
                  }
               }
         }
      }
  }

This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.

Optionally, you can use the field or script and max_docs_per_value settings to control the maximum number of documents collected on any one shard which share a common value.

like image 95
Rahul Avatar answered Oct 15 '22 08:10

Rahul