Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch when a file system goes read-only

We have a 3 node Elasticsearch cluster, running 1.7.3. Every node is both a data and master node. Last night, one of the machines corrupted its file system and re-mounted it read-only. From that point, the cluster returned errors on insertion, like

RemoteTransportException[[db06][inet[/IPREMOVED:9300]][indices:data/write/index]]; nested: IndexFailedEngineException[[messages_201503071849][1] Index failed for [message#586279]]; nested: FileNotFoundException[/data/nodes/0/indices/messages_201503071849/1/index/_1v70.fdx (Read-only file system)];

Is there any way to configure the system to handle this error better (ie. for that node to take itself out of the cluster)? We want to be able to continue with writes in this situation.

like image 834
Mark Fletcher Avatar asked Dec 14 '15 15:12

Mark Fletcher


People also ask

What is the need for tuning the performance of Elasticsearch?

Why Is ElasticSearch Tuning Required? Elasticsearch gives you moderate performance for search and injection of logs maintaining a balance. But when the service utilization or service count within the infrastructure grows, logs grow in similar proportion.

Can I run multiple Elasticsearch nodes on the same machine?

But to run multiple nodes in the same hosts you need to have a different elasticsearch. yml for every node with separated data and log folders, there isn't a way to use the same elasticsearch. yml to run multiple nodes at the same time.

What happens when an Elasticsearch node goes down?

If a data node goes down for some reason, all of its shards will become “unassigned shards” and Elasticsearch will try to assign them on different nodes (by duplicating other replicas of those shards). During that time, the cluster state might be red or yellow.


1 Answers

Actually reading from several places (mainly ElasticSearch's forum) it appears that Elastic nodes have no way to recover by themselves of this error, and even worse, the cluster will lock itself in case of a failure.

The reason why this happens (from here):

The reason why ES is not shutting down automatically is because org.elasticsearch.env.NodeEnviroment keeps a java.nio.file.FileStore which is never monitored by calling isReadOnly() method regularly.

See the same post for 2 possible solutions

like image 169
Adonis Avatar answered Oct 23 '22 23:10

Adonis