How to Get All Results from Elasticsearch in Python

Tags:

I am brand new to using Elasticsearch and I'm having an issue getting all results back when I run an Elasticsearch query through my Python script. My goal is to query an index ("my_index" below), take those results, and put them into a pandas DataFrame which goes through a Django app and eventually ends up in a Word document.

My code is:

es = Elasticsearch()
logs_index = "my_index"
logs = es.search(index=logs_index,body=my_query)

and it tells me I have 72 hits, but then when I do:

df = logs['hits']['hits']
len(df)

It says the length is only 10. I saw someone had a similar issue on this question but their solution did not work for me.

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
logs_index = "my_index"
search = Search(using=es)
total = search.count()
search = search[0:total]
logs = es.search(index=logs_index,body=my_query)
len(logs['hits']['hits'])

The len function still says I only have 10 results. What am I doing wrong, or what else can I do to get all 72 results back?

ETA: I am aware that I can just add "size": 10000 to my query to stop it from truncating to just 10, but since the user will be entering their search query I need to find another way that isn't just in the search query.

652

asked Dec 11 '18 17:12

carousallie

2 Answers

You need to pass a size parameter to your es.search() call.

Please read the API Docs

size – Number of hits to return (default: 10)

An example:

es.search(index=logs_index, body=my_query, size=1000)

Please note that this is not an optimal way to get all index documents or a query that returns a lot of documents. For that you should do a scroll operation which is also documented in the API Docs provided under the scan() abstraction for scroll Elastic Operation.

You can also read about it in elasticsearch documentation

147

answered Sep 18 '22 06:09

Alexandre Juma

It is also possible to use the elasticsearch_dsl (link) library:

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd

client = Elasticsearch()
s = Search(using=client, index="my_index")

df = pd.DataFrame([hit.to_dict() for hit in s.scan()])

The secret here is s.scan() which handles pagination and queries the entire index.

Note that the example above will return the entire index since it was not passed any query. To create a query with elasticsearch_dsl check this link.

answered Sep 19 '22 06:09

gabra

Related questions
                            
                                How to interpret naming convention in PyPi package name
                            
                                Turn on debug logging in python
                            
                                Apache Airflow: Control over logging [Disable/Adjust logging level]
                            
                                How to change the color of a Tkinter label programmatically?
                            
                                Mocking Method Calls In Python
                            
                                itertools does not recognize numpy ints as valid inputs on Python 3.6
                            
                                Pass JSON to JS using Django render
                            
                                How to assert django uses certain template in pytest
                            
                                Keras Custom loss function to pass arguments other than y_true and y_pred
                            
                                What's the best way to add a trailing slash to a pathlib directory?
                            
                                Why is PHP7 so much faster than Python3 in executing this simple loop?
                            
                                Was cqlsh 5.0.1 broken in cassandra 3.11.2 release?
                            
                                How to import JSON file to MongoDB using Python
                            
                                Test if dictionary key exists, is not None and isn't blank
                            
                                Python argparse how to pass False from the command line?
                            
                                Get unique values from pandas series of lists
                            
                                How to omit (remove) virtual environment (venv) from python coverage unit testing?
                            
                                How does Keras 1d convolution layer work with word embeddings - text classification problem? (Filters, kernel size, and all hyperparameter)
                            
                                Type hints: Is it a bad practice to alias primitive data types?
                            
                                How to initialize a dict from a SimpleNamespace?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to Get All Results from Elasticsearch in Python

Tags:

python

elasticsearch

carousallie

People also ask

2 Answers

Alexandre Juma

gabra

Recent Activity

Donate For Us