Is there a generic way in ES to know "oops, cluster is hosed, index must be rebuilt"?
Alternately, a pattern or approach that answers this question?
So far, we have used the following approach:
1) If cluster goes to red status, data has been lost, index must be rebuilt.
2) If cluster flips between green and yellow, no data loss has occurred.
3) Similar to #2, on an index with 5 shards, as long as the "active_shards" value is equal to or greater than 5, all is well.
Is #3 fair? Basically, is the following correct:
DataLossHasOccurred == ("active_shards" < "active_primary_shards")
No there isn't.
3 is an equivalent check to the others.
When the cluster is "red" it means some data is not available. It may not be lost. If a few servers go offline, but can be brought back up the data can be recovered. When that happens the cluster will return to a green state.
When the cluster is "yellow" it means the cluster is operating at a reduced. Depending on the number of replicas configured for the impacted indices this may be a concern or may not. The metrics I use for monitoring this are the overall state and the number of unallocated shards. If the cluster is in a yellow state and the number of unallocated shards is not going down then something is misconfigured.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With