Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Healthy Elasticsearch cluster turns RED after opening a closed index

I have a managed cluster hosted by elastio.co. Here is the configuration |Platform => Amazon Web Services| |Memory => 4 GB| |Storage => 96 GB| |SSD => Yes| |High availability => Yes 2 data centers|

Each index in this cluster contain log data of exactly one day. Average index size is 15 mb and average doc count is 15000. The cluster is not in any way under any kind of pressure (JVM, Indexing & Searching time, Disk Space all are in very comfort zone)

When I opened a previously closed index the cluster is turned RED. Here are some matrices I found querying the elasticsearch.

GET /_cluster/allocation/explain
{
  "index": "some_index_name",    # 1 Primary shard , 1 replica shard 
  "shard": 0,
  "primary": true
}

Response :

"unassigned_info": {
"reason": "ALLOCATION_FAILED"
"failed_allocation_attempts": 3,
"details": "failed recovery, failure RecoveryFailedException[[some_index_name][0]: Recovery failed on {instance-*****}{Hash}{HASH}{IP}{IP}{logical_availability_zone=zone-1, availability_zone=***, region=***}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFoundException[no segments* file found in store(mmapfs(/app/data/nodes/0/indices/MFIFAQO2R_ywstzqrfbY4w/0/index)): files: []]; ",
"last_allocation_status": "no_valid_shard_copy"
}, 
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
  {
    "node_name": "instance-***",
    "node_decision": "no",
    "store": {
      "in_sync": false,
      "allocation_id": "RANDOM_HASH",
      "store_exception": {
        "type": "index_not_found_exception",
        "reason": "no segments* file found in SimpleFSDirectory@/app/data/nodes/0/indices/RANDOM_HASH/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@346e1b99: files: []"
      }
  }
},
{
  "node_name": "instance-***",
  "node_attributes": {
    "logical_availability_zone": "zone-0",
  },
  "node_decision": "no",
  "store": {
    "found": false
  }
}

I've tried rerouting the shards to a node. Even setting data loss flag to true.

POST _cluster/reroute
{
  "commands" : [
  {"allocate_stale_primary" : {
  "index" : "some_index_name", "shard" : 0,
  "node" : "instance-***",
  "accept_data_loss" : true
    }
  }
  ]
}

Response:

"acknowledged": true,
"state": {
"version": 338190,
"state_uuid": "RANDOM_HASH",
"master_node": "RANDOM_HASH",
"blocks": {
  "indices": {
    "restored_**: {
      "4": {
        "description": "index closed",
        "retryable": false,
        "levels": [
          "read",
          "write"
        ]
      }
    },
    "restored_**": {
      "4": {
        "description": "index closed",
        "retryable": false,
        "levels": [
          "read",
          "write"
        ]
      }
    }
  }
},
"routing_table": {
  "indices": {
    "SOME_INDEX_NAME": {
      "shards": {
        "0": [
          {
            "state": "INITIALIZING",
            "primary": true,
            "relocating_node": null,
            "shard": 0,
            "index": "SOME_INDEX_NAME",
            "recovery_source": {
              "type": "EXISTING_STORE"
            },
            "allocation_id": {
              "id": "HASH"
            },
            "unassigned_info": {
              "reason": "ALLOCATION_FAILED",
              "failed_attempts": 4,
              "delayed": false,
              "details": "same as explanation above ^ ",
              "allocation_status": "no_valid_shard_copy"
            }
          },
          {
            "state": "UNASSIGNED",
            "primary": false,
            "node": null,
            "relocating_node": null,
            "shard": 0,
            "index": "some_index_name",
            "recovery_source": {
              "type": "PEER"
            },
            "unassigned_info": {
              "reason": "INDEX_REOPENED",
              "delayed": false,
              "allocation_status": "no_attempt"
            }
          }
        ]
      }
    },

Any kind of suggestion is welcomed. Thanks and regards.

like image 730
DeshErBojhaa Avatar asked Feb 27 '18 09:02

DeshErBojhaa


People also ask

Why is my cluster red Elasticsearch?

A cluster status that shows red status doesn't mean that your cluster is down. Rather, this status indicates that at least one primary shard and its replicas aren't allocated to a node. If your cluster status shows yellow status, then the primary shards for all indices are allocated to nodes in your cluster.

Why my Elasticsearch index is red?

A red cluster indicates that at least one primary shard and all of its replicas are missing. This means that data is missing, searches will return partial results, and indexing into that shard will return errors.

Why Elasticsearch health is red?

A red status indicates that not only has the primary shard been lost, but also that a replica has not been promoted to primary in its place.


1 Answers

This occurs when the master-node is brought down abruptly.

Here are the steps I took to resolve the same issue, that I had encountered ,

  • Step 1: Check the allocation

    • curl -XGET http://localhost:9200/_cat/allocation?v
  • Step 2: Check the shard stores

    • curl -XGET http://localhost:9200/_shard_stores?pretty Look out for "index", "shard" and "node" that has the error that you displayed. The ERROR should be --> "no segments* file found in SimpleFSDirectory@/...."
  • Step 3: Now reroute that index as shown below

    • curl -XPOST 'http://localhost:9200/_cluster/reroute?master_timeout=5m' \ -d '{ "commands": [ { "allocate_empty_primary": { "index": "IndexFromStep2", "shard": ShardFromStep2 , "node": "NodeFromStep2", "accept_data_loss" : true } } ] }'
  • Step 4: Repeat Step2 and Step3 until you see this output.

    • curl -XGET 'http://localhost:9200/_shard_stores?pretty'

    { "indices" : { } }

Your cluster should go green soon.

like image 183
PraveenMak Avatar answered Nov 15 '22 10:11

PraveenMak