I am brand new to using Elasticsearch and I'm having an issue getting all results back when I run an Elasticsearch query through my Python script. My goal is to query an index ("my_index" below), take those results, and put them into a pandas DataFrame which goes through a Django app and eventually ends up in a Word document.
My code is:
es = Elasticsearch()
logs_index = "my_index"
logs = es.search(index=logs_index,body=my_query)
and it tells me I have 72 hits, but then when I do:
df = logs['hits']['hits']
len(df)
It says the length is only 10. I saw someone had a similar issue on this question but their solution did not work for me.
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch()
logs_index = "my_index"
search = Search(using=es)
total = search.count()
search = search[0:total]
logs = es.search(index=logs_index,body=my_query)
len(logs['hits']['hits'])
The len function still says I only have 10 results. What am I doing wrong, or what else can I do to get all 72 results back?
ETA: I am aware that I can just add "size": 10000 to my query to stop it from truncating to just 10, but since the user will be entering their search query I need to find another way that isn't just in the search query.
We can get maximum 10000 records by using size parameter. What if we get more than 20000 records after applying filter query. Please update if there is any way to see records beyond 10000. You can use size and from parameters to display by default up to 10000 records to your users.
Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids. The results from this would contain a _scroll_id which you have to query to get the next 100 chunk. This answer needs more updates. search_type=scan is now deprecated.
You need to pass a size
parameter to your es.search()
call.
Please read the API Docs
size – Number of hits to return (default: 10)
An example:
es.search(index=logs_index, body=my_query, size=1000)
Please note that this is not an optimal way to get all index documents or a query that returns a lot of documents. For that you should do a scroll
operation which is also documented in the API Docs provided under the scan() abstraction for scroll
Elastic Operation.
You can also read about it in elasticsearch documentation
It is also possible to use the elasticsearch_dsl
(link) library:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
import pandas as pd
client = Elasticsearch()
s = Search(using=client, index="my_index")
df = pd.DataFrame([hit.to_dict() for hit in s.scan()])
The secret here is s.scan()
which handles pagination and queries the entire index.
Note that the example above will return the entire index since it was not passed any query. To create a query with elasticsearch_dsl
check this link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With