There is way to get the top n terms result. For example:
{
"aggs": {
"apiSalesRepUser": {
"terms": {
"field": "userName",
"size": 5
}
}
}
}
Is there any way to set the offset for the terms result?
Elasticsearch - Aggregations. The aggregations framework collects all the data selected by the search query and consists of many building blocks, which help in building complex summaries of the data.
When it is, Elasticsearch will override it and reset it to be equal to size. The default shard_size is (size * 1.5 + 10). doc_count values for a terms aggregation may be approximate. As a result, any sub-aggregations on the terms aggregation may also be approximate.
The terms aggregation is meant to return the top terms and does not allow pagination. Document counts (and the results of any sub aggregations) in the terms aggregation are not always accurate. Each shard provides its own view of what the ordered list of terms should be.
If the shards' data doesn’t change between searches, the shards return cached aggregation results. When running aggregations, Elasticsearch uses double values to hold and represent numeric data. As a result, aggregations on long numbers greater than 2 53 are approximate.
If you mean something like ignore first m
results and return the next n
results then no; it is not possible. A workaround to that would be to set size
to m + n
and do client side processing to ignore the first m
results.
A little late, but (at least) since Elastic 5.2.0 you can use partitioning in the terms aggregation to paginate results.
https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions
Maybe this helps a bit:
"aggregations": {
"apiSalesRepUser": {
"terms": {
"field": "userName",
"size": 9999 ---> add here a bigger size
}
},
"aggregations": {
"limitBucket": {
"bucket_sort": {
"sort": [],
"from": 10,
"size": 20,
"gap_policy": "SKIP"
}
}
}
}
I am not sure about what value to put in the term size. I would suggest to put a reasonable value. This limits the initial aggregation, then the second limitBucket agg will limit again the term agg. This will probably still load in memory all the documents that you limited in the terms agg. That is why it depends on your scenario, if it's reasonable not get all results (i.e. if you have tens of thousands). I.e you are doing a google like search where you don't need to jump to page 1000.
Compared to the alternative to get the data on the client side, this might save you some data transfer from ES, but as I said weight this carefully as it loads all a lot of data in ES memory and you might have memory issues in ElasticSearch
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With