Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly does -1 refresh_interval in Elasticsearch mean?

Tags:

I have read a lot of articles about index refreshing in Elasticsearch. I understand the implication of different intervals that are greater than 0, which is the elapsed time between consecutive segments flush, making them available for search. However, I am not sure what refresh_interval: -1 does exactly. In my understanding, it's a means to disable automatic index refreshing but not completely. Elasticsearch still flushes segments from time to time even though the refresh_interval is set to -1. I wonder which mechanism governs this flushing activity if automatic refresh is disabled.

Sorry I know I don't have a lot of code to post, so I will give a bit of background into what I am after. My application doesn't need near real-time search; it only needs eventual consistency. However, this eventuality should be reasonable, i.e. within a few seconds to less than a minute, not half an hour. I was wondering if I can leave it to Elasticsearch to decide when best to refresh at its convenience rather than refreshing at a regular interval. The reason is because disabling automatic refreshing does bring some benefits in terms of performance to my application, e.g. JVM Heap Size usage rises less aggressively in between garbage collection interval (see graph below)

After disabling refresh interval, heap usage rises less aggressively

like image 357
Lim H. Avatar asked Apr 06 '16 11:04

Lim H.


People also ask

What is refresh interval in Elasticsearch?

By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. You can change this default interval using the index. refresh_interval setting.

What does indexing in Elasticsearch mean?

An index is defined as: An index is like a 'database' in a relational database. It has a mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.

What is the need for tuning the performance of Elasticsearch?

Why Is ElasticSearch Tuning Required? Elasticsearch gives you moderate performance for search and injection of logs maintaining a balance. But when the service utilization or service count within the infrastructure grows, logs grow in similar proportion.

In what format data is stored in Elasticsearch?

Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.


1 Answers

There is a bit of confusion in your understanding. Refreshing the index and writing to disk are two different processes and are not necessarily related, thus your observation about segments still being written even if the refresh_interval is -1.

When a document is indexed, it is added to the in-memory buffer and appended to the translog file. When a refresh takes place the docs in the buffer are written to a new segment, without an fsync, the segment is opened to make it visible to search and the buffer is cleared. The translog is not yet cleared and nothing is actually persisted to disk (as there was no fsync).

Now imagine the refresh is not happening: there is no index refresh, you cannot search your documents, the segments are not created in cache.

The settings here will dictate when the flush (writing to disk) happens. By default when the translog reaches 512mb in size, or after 30 minutes. This is actually persisting data on disk, everything else is in filesystem cache (if the node dies or the machine is rebooted the cache is lost and the translog is the only salvation).

like image 119
Andrei Stefan Avatar answered Sep 20 '22 05:09

Andrei Stefan