Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit and Offset in Term Aggregation ElasticSearch

There is way to get the top n terms result. For example:

{
  "aggs": {
    "apiSalesRepUser": {
      "terms": {
        "field": "userName",
        "size": 5
      }
    }
  }
}

Is there any way to set the offset for the terms result?

like image 746
Mukesh Kumar Avatar asked Apr 02 '15 12:04

Mukesh Kumar


People also ask

What is Elasticsearch aggregations?

Elasticsearch - Aggregations. The aggregations framework collects all the data selected by the search query and consists of many building blocks, which help in building complex summaries of the data.

What is the default shard_size in Elasticsearch?

When it is, Elasticsearch will override it and reset it to be equal to size. The default shard_size is (size * 1.5 + 10). doc_count values for a terms aggregation may be approximate. As a result, any sub-aggregations on the terms aggregation may also be approximate.

What are the limitations of the terms aggregation?

The terms aggregation is meant to return the top terms and does not allow pagination. Document counts (and the results of any sub aggregations) in the terms aggregation are not always accurate. Each shard provides its own view of what the ordered list of terms should be.

Why do Elasticsearch shards return cached aggregation results?

If the shards' data doesn’t change between searches, the shards return cached aggregation results. When running aggregations, Elasticsearch uses double values to hold and represent numeric data. As a result, aggregations on long numbers greater than 2 53 are approximate.


3 Answers

If you mean something like ignore first m results and return the next n results then no; it is not possible. A workaround to that would be to set size to m + n and do client side processing to ignore the first m results.

like image 191
bittusarkar Avatar answered Oct 16 '22 17:10

bittusarkar


A little late, but (at least) since Elastic 5.2.0 you can use partitioning in the terms aggregation to paginate results.

https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions

like image 28
c_froehlich Avatar answered Oct 16 '22 15:10

c_froehlich


Maybe this helps a bit:

"aggregations": {
    "apiSalesRepUser": {
      "terms": {
        "field": "userName",
        "size": 9999 ---> add here a bigger size 
      }
    },
  "aggregations": {
    "limitBucket": {
      "bucket_sort": {
        "sort": [],
        "from": 10,
        "size": 20,
        "gap_policy": "SKIP"
      }
    }
  }
}

I am not sure about what value to put in the term size. I would suggest to put a reasonable value. This limits the initial aggregation, then the second limitBucket agg will limit again the term agg. This will probably still load in memory all the documents that you limited in the terms agg. That is why it depends on your scenario, if it's reasonable not get all results (i.e. if you have tens of thousands). I.e you are doing a google like search where you don't need to jump to page 1000.

Compared to the alternative to get the data on the client side, this might save you some data transfer from ES, but as I said weight this carefully as it loads all a lot of data in ES memory and you might have memory issues in ElasticSearch

like image 1
andreyro Avatar answered Oct 16 '22 16:10

andreyro