I am using elasticsearch-dsl python library to connect to elasticsearch and do aggregations.
I am following code
search.aggs.bucket('per_date', 'terms', field='date')\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()
This works fine but returns only 10 results in response.aggregations.per_ts.buckets
I want all the results
I have tried one solution with size=0
as mentioned in this question
search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()
But this results in error
TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]')
Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.
The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).
doc_count_error_upper_bound is the maximum number of those missing documents. response = client.
I had the same issue. I finally found this solution:
s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0)
a = A('terms', field='jks_title.keyword', size=999999)
s.aggs.bucket('by_title', a)
response = s.execute()
After 2.x
, size=0
for all bucket results won't work anymore, please refer to this thread. Here in my example I just set the size equal 999999. You can pick a large number according to your case.
It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647.
Hope this helps.
This is a bit older but I ran into the same issue. What I wanted was basically an iterator that i could use to go through all aggregations that i got back (i also have a lot of unique results).
The best thing i found is to create a python generator like this
def scan_aggregation_results():
i=0
partitions=20
while i < partitions:
s = Search(using=elastic, index='my_index').extra(size=0)
agg = A('terms', field='my_field.keyword', size=999999,
include={"partition": i, "num_partitions": partitions})
s.aggs.bucket('my_agg', agg)
result = s.execute()
for item in result.aggregations.my_agg.buckets:
yield my_field.key
i = i + 1
# in other parts of the code just do
for item in scan_aggregation_results():
print(item) # or do whatever you want with it
The magic here is that elastic will automatically partition the number of results by 20, ie the number of partitions i define. I just have to set the size to something large enough to hold a single partition, in this case the result can be up to 20 million items large (or 20*999999). If you have much less items, like me, to return (like 20000) then you will just have 1000 results per query in your bucket, regardless that you defined a much larger size.
Using the generator construct as outlined above you can then even get rid of that and create your own scanner so to speak, iterating over all results individually, just what i wanted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With