Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch-dsl aggregations returns only 10 results. How to change this

I am using elasticsearch-dsl python library to connect to elasticsearch and do aggregations.

I am following code

search.aggs.bucket('per_date', 'terms', field='date')\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()

This works fine but returns only 10 results in response.aggregations.per_ts.buckets

I want all the results

I have tried one solution with size=0 as mentioned in this question

search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})

response = search.execute()

But this results in error

TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]')
like image 362
hard coder Avatar asked Nov 10 '17 09:11

hard coder


People also ask

Is Elasticsearch good for aggregations?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is sub aggregation in Elasticsearch?

The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).

What is Doc_count_error_upper_bound?

doc_count_error_upper_bound is the maximum number of those missing documents. response = client.


2 Answers

I had the same issue. I finally found this solution:

s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0)
a = A('terms', field='jks_title.keyword', size=999999)
s.aggs.bucket('by_title', a)
response = s.execute()

After 2.x, size=0 for all bucket results won't work anymore, please refer to this thread. Here in my example I just set the size equal 999999. You can pick a large number according to your case.

It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647.

Hope this helps.

like image 151
Soony Avatar answered Oct 14 '22 06:10

Soony


This is a bit older but I ran into the same issue. What I wanted was basically an iterator that i could use to go through all aggregations that i got back (i also have a lot of unique results).

The best thing i found is to create a python generator like this

def scan_aggregation_results():
    i=0
    partitions=20
    while i < partitions:
        s = Search(using=elastic, index='my_index').extra(size=0)
        agg = A('terms', field='my_field.keyword', size=999999,
                include={"partition": i, "num_partitions": partitions})
        s.aggs.bucket('my_agg', agg)
        result = s.execute()

        for item in result.aggregations.my_agg.buckets:
            yield my_field.key
        i = i + 1

# in other parts of the code just do
for item in scan_aggregation_results():
    print(item)  # or do whatever you want with it

The magic here is that elastic will automatically partition the number of results by 20, ie the number of partitions i define. I just have to set the size to something large enough to hold a single partition, in this case the result can be up to 20 million items large (or 20*999999). If you have much less items, like me, to return (like 20000) then you will just have 1000 results per query in your bucket, regardless that you defined a much larger size.

Using the generator construct as outlined above you can then even get rid of that and create your own scanner so to speak, iterating over all results individually, just what i wanted.

like image 21
Peter Kunszt Avatar answered Oct 14 '22 08:10

Peter Kunszt