I am using elasticsearch-dsl python library to connect to elasticsearch and do aggregations. I am following code <pre class="prettyprint"><code>search.aggs.bucket('per_date', 'terms', field='date')\ .bucket('response_time_percentile', 'percentiles', field='total_time', percents=percentiles, hdr={"number_of_significant_value_digits": 1}) response = search.execute() </code></pre> This works fine but returns only 10 results in <code>response.aggregations.per_ts.buckets</code> I want all the results I have tried one solution with <code>size=0</code> as mentioned in this question <pre class="prettyprint"><code>search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\ .bucket('response_time_percentile', 'percentiles', field='total_time', percents=percentiles, hdr={"number_of_significant_value_digits": 1}) response = search.execute() </code></pre> But this results in error <pre class="prettyprint"><code>TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]') </code></pre>

I had the same issue. I finally found this solution: <pre class="prettyprint"><code>s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0) a = A('terms', field='jks_title.keyword', size=999999) s.aggs.bucket('by_title', a) response = s.execute() </code></pre> After <code>2.x</code>, <code>size=0</code> for all bucket results won't work anymore, please refer to this thread. Here in my example I just set the size equal 999999. You can pick a large number according to your case. <blockquote> It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647. </blockquote> Hope this helps.

elasticsearch-dsl aggregations returns only 10 results. How to change this

Tags:

python

elasticsearch

elasticsearch-dsl

I am using elasticsearch-dsl python library to connect to elasticsearch and do aggregations.

I am following code

search.aggs.bucket('per_date', 'terms', field='date')\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()

This works fine but returns only 10 results in response.aggregations.per_ts.buckets

I want all the results

I have tried one solution with size=0 as mentioned in this question

search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\
        .bucket('response_time_percentile', 'percentiles', field='total_time',
                percents=percentiles, hdr={"number_of_significant_value_digits": 1})

response = search.execute()

But this results in error

TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]')

362

asked Nov 10 '17 09:11

hard coder

2 Answers

I had the same issue. I finally found this solution:

s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0)
a = A('terms', field='jks_title.keyword', size=999999)
s.aggs.bucket('by_title', a)
response = s.execute()

After 2.x, size=0 for all bucket results won't work anymore, please refer to this thread. Here in my example I just set the size equal 999999. You can pick a large number according to your case.

It is recommended to explicitly set reasonable value for size a number between 1 to 2147483647.

Hope this helps.

151

answered Oct 14 '22 06:10

Soony

This is a bit older but I ran into the same issue. What I wanted was basically an iterator that i could use to go through all aggregations that i got back (i also have a lot of unique results).

The best thing i found is to create a python generator like this

def scan_aggregation_results():
    i=0
    partitions=20
    while i < partitions:
        s = Search(using=elastic, index='my_index').extra(size=0)
        agg = A('terms', field='my_field.keyword', size=999999,
                include={"partition": i, "num_partitions": partitions})
        s.aggs.bucket('my_agg', agg)
        result = s.execute()

        for item in result.aggregations.my_agg.buckets:
            yield my_field.key
        i = i + 1

# in other parts of the code just do
for item in scan_aggregation_results():
    print(item)  # or do whatever you want with it

The magic here is that elastic will automatically partition the number of results by 20, ie the number of partitions i define. I just have to set the size to something large enough to hold a single partition, in this case the result can be up to 20 million items large (or 20*999999). If you have much less items, like me, to return (like 20000) then you will just have 1000 results per query in your bucket, regardless that you defined a much larger size.

Using the generator construct as outlined above you can then even get rid of that and create your own scanner so to speak, iterating over all results individually, just what i wanted.

answered Oct 14 '22 08:10

Peter Kunszt

Related questions
                            
                                Bizzare matplotlib behaviour in displaying images cast as floats
                            
                                Using a C function in Python
                            
                                Flask-SQLAlchemy check if table exists in database
                            
                                Date axis in heatmap seaborn
                            
                                In Python Docstrings, What Does `:obj:` do?
                            
                                Drawing fewer plots than specified in matplotlib subplots
                            
                                Python Pandas Group By Error 'Index' object has no attribute 'labels'
                            
                                Using python's networkX to compute personalized page rank
                            
                                why plt.show() shows one extra blank figure
                            
                                PCA memory error in Sklearn: Alternative Dim Reduction?
                            
                                None dimension raise ValueError in batch_norm with Tensorflow
                            
                                Why does argparse include default value for optional argument even when argument is specified?
                            
                                AttributeError: module 'pandas' has no attribute 'read_csv' Python3.5
                            
                                Is there any python "utf-8" string constant?
                            
                                pyparsing nestedExpr and nested parentheses
                            
                                How to access serializer.data on ListSerializer parent class in DRF?
                            
                                how to control output from fbprophet?
                            
                                Python 3 UnicodeDecodeError - How do I debug UnicodeDecodeError?
                            
                                Stream files to kafka using airflow
                            
                                How to check the channel order of an image?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With