I need to index/update a document in Elasticsearch and wait until it is searchable (refresh has been done). There is a related issue on Github: https://github.com/elasticsearch/elasticsearch/issues/1063
I won't force the refresh because it impacts indexing performances and I will need to perform this operation really often. I tried to wait for 1 second as described in the Github issue. It works really well as long as Elasticsearch is not under pressure, but when there is not much RAM left (which might happen occasionally) I have seen the refresh take up to 5 or 6 seconds. Thus I tried another way.
I have written an helper function in my backend that waits for the “searchable” document to reach a given version. It is quite simple:
- GET the document with realtime=false
- if there is a result
- if result.version >= wanted.version.
Return
- else
wait a little more and retry
- else if the doc is not found
- HEAD the document with realtime=true (test if the doc exists in the transaction log)
- if the doc is found (then it has just been created)
wait a little more and retry
- else
Return. (the doc might have been created and deleted really fast)
The wanted version is the version returned by elasticsearch when the document has been indexed.
This algorithm works but you can see that it is far from being perfect.
first it will make more calls to elasticsearch when it is under pressure, which is not a really good idea.
I have seen elastic search reset the version number when a doc has been deleted for some time. If for some reason the function misses that, we might wait until the doc reaches this version again. (that’s why I also added a timeout).
Does someone have a better solution? Scaling automatically is not an acceptable answer right now.
An elasticsearch refresh makes your documents available for search, but it doesn't make sure that they are written to disk to a persistent storage, as it doesn't call fsync, thus doesn't guarantee durability.
It is a powerful feature, but it can majorly affect your search speed. You should be careful when using scripts because Elasticsearch will apply the script to every result. The more data you have in the index, the slower the search will become as it goes over every result. Wildcard queries in Elasticsearch are similar to LIKE queries in SQL.
If a new document is indexed to Elasticsearch index then it is available for searching something like 1 second after index operation. However it can be forced to make this document searchable immediately by calling _flush or _refresh operation on index.
The default size for a query is 10. You can change the size in the search parameter: Similar to retrieving more documents than you need, getting too many fields you don’t use will also slow down your search speed. This is due to the same reason we mentioned earlier – Elasticsearch will need to construct and transfer more documents to the client.
As Guillaume Massé said, a solution is about to be merged in Elasticsearch https://github.com/elastic/elasticsearch/issues/1063#issuecomment-223368867
Thus I would advise to wait for the builtin functionality rather than implementing a custom solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With