As part of an investigation for using ElasticSearch as a reliable document store, from a Java application, I'm running a basic HA test as follows:
I set up a minimal cluster using a readily available Docker image of ElasticSearch 1.6 (https://registry.hub.docker.com/_/elasticsearch), with:
Then I run a small loader app that inserts 500,000 documents of ~1KB each.
This takes approximately 1 minute and a half on my machine. During this time, I restart the current master node (docker restart).
At the end of the run, the Java API has responded OK to 100% of my queries, but when I check the documents count with a curl request, a few documents are missing (somewhere between 2 and 10 depending on runs I made)
Even with an explicit "_refresh" request on the index, my document count is the same.
My main concern of course is not that some documents cannot be stored during a crash but rather the positive result returned by the API (especially since I'm testing with WriteConsistencyLevel.ALL).
I'm aware of this ticket, yet unsure if it applies to my basic scenario
My inserts are done as follows:
client.prepareUpdate("test", "test", id)
.setDoc(doc).setUpsert(doc)
.setConsistencyLevel(WriteConsistencyLevel.ALL)
.execute.get.isCreated == true
The rest of the code can be found here : https://github.com/joune/nosql/blob/master/src/main/scala/ap.test.nosql/Loader.scala
Please advise if you think I'm doing something obviously wrong.
(I know some will reply that considering ElasticSearch as a reliable document store is plain wrong, but that's the goal of the study and not the kind of answer I expect)
Update Additional logs as requested by Andrei Stefan
> grep discovery.zen.minimum_master_nodes elasticsearch.yml
discovery.zen.minimum_master_nodes: 2
> curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{"transient":{"logger._root":"DEBUG"}}'
{"acknowledged":true,"persistent":{},"transient":{"logger":{"_root":"DEBUG"}}}%
> curl -XPUT 'http://localhost:9200/_cluster/settings' -d '{"transient": {"logger.index.translog":"TRACE"}}'
{"acknowledged":true,"persistent":{},"transient":{"logger":{"index":{"translog":"TRACE"}}}}%
Run test with 200,000 entries:
0 KO | 200000 OK
> curl -XGET 'localhost:9200/test/test/_count?preference=_primary'
{"count":199991,"_shards":{"total":5,"successful":5,"failed":0}}%
I've put the logs here: https://gist.github.com/ab1ed844f2038c30e63b
I'm aware of this ticket, yet unsure if it applies to my basic scenario https://github.com/elastic/elasticsearch/issues/7572
I did some digging and I turn out it does. The reason is that during node shutdown we close the transport layer before we shut down the indices service, which means that ongoing operation are effectively partitioned away (exactly like the case of a networking issue). Obviously this is no good and I opened https://github.com/elastic/elasticsearch/issues/12314 to track this.
As a side note - with Elasticsearch 2.0, we have added a shard header to the response, indicating how many shards were successful. This gives you a way to check wether an operation was indeed successfully written to all shards and retry if needed. This is an example of a successful response (writing to both primary and replica).
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 2
},
Finally note that a shard failure doesn't mean that #7572 has played a part. It's highly unlikely.However, in all cases where #7572 did happen, the header will have total != successful.
Lots of good notes in the comments here. I would humbly suggest that a cluster with only two eligible master nodes is not "high availability." The elasticsearch docs state:
It is recommended to avoid having only two master eligible nodes, since a quorum of two is two. Therefore, a loss of either master node will result in an inoperable cluster.
Essentially, in a two-master cluster with discovery.zen.minimum_master_nodes
properly set to 2, as soon as either node goes down, you can't have a master, and so you no longer have a cluster. If minimum_master_nodes
was set to 1, you'd have a whole different set of problems to contend with (split-brain). We run 3 masters for every ES cluster (and, furthermore, run dedicated masters) -- I would be very curious to know if you get different results with a 3 master cluster.
That said, it still seems quite incorrect that the API acknowledges receipt of your docs and then fails to persist them, but I think ultimately it probably does come back around to the fact that elasticsearch isn't designed to run a production-class implementation with only two master-eligible nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With