What exactly does -1 refresh_interval in Elasticsearch mean?

Tags:

I have read a lot of articles about index refreshing in Elasticsearch. I understand the implication of different intervals that are greater than 0, which is the elapsed time between consecutive segments flush, making them available for search. However, I am not sure what refresh_interval: -1 does exactly. In my understanding, it's a means to disable automatic index refreshing but not completely. Elasticsearch still flushes segments from time to time even though the refresh_interval is set to -1. I wonder which mechanism governs this flushing activity if automatic refresh is disabled.

Sorry I know I don't have a lot of code to post, so I will give a bit of background into what I am after. My application doesn't need near real-time search; it only needs eventual consistency. However, this eventuality should be reasonable, i.e. within a few seconds to less than a minute, not half an hour. I was wondering if I can leave it to Elasticsearch to decide when best to refresh at its convenience rather than refreshing at a regular interval. The reason is because disabling automatic refreshing does bring some benefits in terms of performance to my application, e.g. JVM Heap Size usage rises less aggressively in between garbage collection interval (see graph below)

After disabling refresh interval, heap usage rises less aggressively

357

asked Apr 06 '16 11:04

Lim H.

1 Answers

There is a bit of confusion in your understanding. Refreshing the index and writing to disk are two different processes and are not necessarily related, thus your observation about segments still being written even if the refresh_interval is -1.

When a document is indexed, it is added to the in-memory buffer and appended to the translog file. When a refresh takes place the docs in the buffer are written to a new segment, without an fsync, the segment is opened to make it visible to search and the buffer is cleared. The translog is not yet cleared and nothing is actually persisted to disk (as there was no fsync).

Now imagine the refresh is not happening: there is no index refresh, you cannot search your documents, the segments are not created in cache.

The settings here will dictate when the flush (writing to disk) happens. By default when the translog reaches 512mb in size, or after 30 minutes. This is actually persisting data on disk, everything else is in filesystem cache (if the node dies or the machine is rebooted the cache is lost and the translog is the only salvation).

119

answered Sep 20 '22 05:09

Andrei Stefan

Related questions
                            
                                Spark: Task not Serializable for UDF on DataFrame
                            
                                How export a Jupyter notebook to HTML from the command line?
                            
                                Load TrueType Font to OpenCV
                            
                                How to run aws configure in a travis deploy script?
                            
                                CompletableFuture in the Android Support Library?
                            
                                What's the most minimalistic way to render "OK" in Elixir/Phoenix?
                            
                                Tkinter custom create buttons
                            
                                How to get claim inside Asp.Net Core Razor View
                            
                                How do I Moq the ApplicationDbContext in .NET Core
                            
                                Cannot find module 'react'
                            
                                Update Typescript in Angular2 project
                            
                                How can I get the window width in angularJS on resize from a controller?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With