Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch "block until refresh"/"wait for doc to be searchable" alternatives

I need to index/update a document in Elasticsearch and wait until it is searchable (refresh has been done). There is a related issue on Github: https://github.com/elasticsearch/elasticsearch/issues/1063

I won't force the refresh because it impacts indexing performances and I will need to perform this operation really often. I tried to wait for 1 second as described in the Github issue. It works really well as long as Elasticsearch is not under pressure, but when there is not much RAM left (which might happen occasionally) I have seen the refresh take up to 5 or 6 seconds. Thus I tried another way.

I have written an helper function in my backend that waits for the “searchable” document to reach a given version. It is quite simple:

- GET the document with realtime=false
- if there is a result
    - if result.version >= wanted.version.
        Return
    - else
        wait a little more and retry
- else if the doc is not found
    - HEAD the document with realtime=true (test if the doc exists in the transaction log)
        - if the doc is found (then it has just been created)
            wait a little more and retry
        - else
            Return. (the doc might have been created and deleted really fast)

The wanted version is the version returned by elasticsearch when the document has been indexed.

This algorithm works but you can see that it is far from being perfect.

  • first it will make more calls to elasticsearch when it is under pressure, which is not a really good idea.

  • I have seen elastic search reset the version number when a doc has been deleted for some time. If for some reason the function misses that, we might wait until the doc reaches this version again. (that’s why I also added a timeout).

Does someone have a better solution? Scaling automatically is not an acceptable answer right now.

like image 705
nharraud Avatar asked Jan 23 '15 14:01

nharraud


People also ask

What does Elasticsearch refresh do in Elasticsearch?

An elasticsearch refresh makes your documents available for search, but it doesn't make sure that they are written to disk to a persistent storage, as it doesn't call fsync, thus doesn't guarantee durability.

Why is my Elasticsearch search so slow?

It is a powerful feature, but it can majorly affect your search speed. You should be careful when using scripts because Elasticsearch will apply the script to every result. The more data you have in the index, the slower the search will become as it goes over every result. Wildcard queries in Elasticsearch are similar to LIKE queries in SQL.

How to make a document searchable immediately after indexing in Elasticsearch?

If a new document is indexed to Elasticsearch index then it is available for searching something like 1 second after index operation. However it can be forced to make this document searchable immediately by calling _flush or _refresh operation on index.

What is the default size of a Elasticsearch query?

The default size for a query is 10. You can change the size in the search parameter: Similar to retrieving more documents than you need, getting too many fields you don’t use will also slow down your search speed. This is due to the same reason we mentioned earlier – Elasticsearch will need to construct and transfer more documents to the client.


1 Answers

As Guillaume Massé said, a solution is about to be merged in Elasticsearch https://github.com/elastic/elasticsearch/issues/1063#issuecomment-223368867

Thus I would advise to wait for the builtin functionality rather than implementing a custom solution.

like image 138
nharraud Avatar answered Oct 29 '22 13:10

nharraud