I have a set of 2.8 million docs with sets of tags that I'm querying with ElasticSearch, but many of these docs can be grouped together by one ID. I want to query my data using the tags, and then aggregate them by the ID that repeats. Often my search results have tens of thousands of documents, but I only want to aggregate the top 100 results of the search. How can I constrain an aggregation to only the top 100 results from a query?
A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.
Elasticsearch makes this threshold configurable through the precision_threshold parameter. For example, if you configure a precision_threshold of 1000 , you could expect precision to be excellent if the return value is < 1000 and a bit more approximate otherwise.
Sampler Aggregation :
A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.
"aggs": {
"bestDocs": {
"sampler": {
// "field": "<FIELD>", <-- optional, Controls diversity using a field
"shard_size":100
},
"aggs": {
"bestBuckets": {
"terms": {
"field": "id"
}
}
}
}
}
This query will limit the sub aggregation to top 100 docs from the result and then bucket them by ID.
Optionally, you can use the field or script and max_docs_per_value
settings to control the maximum number of documents collected on any one shard which share a common value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With