Healthy Elasticsearch cluster turns RED after opening a closed index

Tags:

elasticsearch

Each index in this cluster contain log data of exactly one day. Average index size is 15 mb and average doc count is 15000. The cluster is not in any way under any kind of pressure (JVM, Indexing & Searching time, Disk Space all are in very comfort zone)

When I opened a previously closed index the cluster is turned RED. Here are some matrices I found querying the elasticsearch.

GET /_cluster/allocation/explain
{
  "index": "some_index_name",    # 1 Primary shard , 1 replica shard 
  "shard": 0,
  "primary": true
}

Response :

"unassigned_info": {
"reason": "ALLOCATION_FAILED"
"failed_allocation_attempts": 3,
"details": "failed recovery, failure RecoveryFailedException[[some_index_name][0]: Recovery failed on {instance-*****}{Hash}{HASH}{IP}{IP}{logical_availability_zone=zone-1, availability_zone=***, region=***}]; nested: IndexShardRecoveryException[failed to fetch index version after copying it over]; nested: IndexShardRecoveryException[shard allocated for local recovery (post api), should exist, but doesn't, current files: []]; nested: IndexNotFoundException[no segments* file found in store(mmapfs(/app/data/nodes/0/indices/MFIFAQO2R_ywstzqrfbY4w/0/index)): files: []]; ",
"last_allocation_status": "no_valid_shard_copy"
}, 
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because all found copies of the shard are either stale or corrupt",
"node_allocation_decisions": [
  {
    "node_name": "instance-***",
    "node_decision": "no",
    "store": {
      "in_sync": false,
      "allocation_id": "RANDOM_HASH",
      "store_exception": {
        "type": "index_not_found_exception",
        "reason": "no segments* file found in SimpleFSDirectory@/app/data/nodes/0/indices/RANDOM_HASH/0/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@346e1b99: files: []"
      }
  }
},
{
  "node_name": "instance-***",
  "node_attributes": {
    "logical_availability_zone": "zone-0",
  },
  "node_decision": "no",
  "store": {
    "found": false
  }
}

I've tried rerouting the shards to a node. Even setting data loss flag to true.

POST _cluster/reroute
{
  "commands" : [
  {"allocate_stale_primary" : {
  "index" : "some_index_name", "shard" : 0,
  "node" : "instance-***",
  "accept_data_loss" : true
    }
  }
  ]
}

Response:

"acknowledged": true,
"state": {
"version": 338190,
"state_uuid": "RANDOM_HASH",
"master_node": "RANDOM_HASH",
"blocks": {
  "indices": {
    "restored_**: {
      "4": {
        "description": "index closed",
        "retryable": false,
        "levels": [
          "read",
          "write"
        ]
      }
    },
    "restored_**": {
      "4": {
        "description": "index closed",
        "retryable": false,
        "levels": [
          "read",
          "write"
        ]
      }
    }
  }
},
"routing_table": {
  "indices": {
    "SOME_INDEX_NAME": {
      "shards": {
        "0": [
          {
            "state": "INITIALIZING",
            "primary": true,
            "relocating_node": null,
            "shard": 0,
            "index": "SOME_INDEX_NAME",
            "recovery_source": {
              "type": "EXISTING_STORE"
            },
            "allocation_id": {
              "id": "HASH"
            },
            "unassigned_info": {
              "reason": "ALLOCATION_FAILED",
              "failed_attempts": 4,
              "delayed": false,
              "details": "same as explanation above ^ ",
              "allocation_status": "no_valid_shard_copy"
            }
          },
          {
            "state": "UNASSIGNED",
            "primary": false,
            "node": null,
            "relocating_node": null,
            "shard": 0,
            "index": "some_index_name",
            "recovery_source": {
              "type": "PEER"
            },
            "unassigned_info": {
              "reason": "INDEX_REOPENED",
              "delayed": false,
              "allocation_status": "no_attempt"
            }
          }
        ]
      }
    },

Any kind of suggestion is welcomed. Thanks and regards.

730

asked Feb 27 '18 09:02

DeshErBojhaa

1 Answers

This occurs when the master-node is brought down abruptly.

Here are the steps I took to resolve the same issue, that I had encountered ,

Step 1: Check the allocation
- curl -XGET http://localhost:9200/_cat/allocation?v
Step 2: Check the shard stores
- curl -XGET http://localhost:9200/_shard_stores?pretty Look out for "index", "shard" and "node" that has the error that you displayed. The ERROR should be --> "no segments* file found in SimpleFSDirectory@/...."
Step 3: Now reroute that index as shown below
- curl -XPOST 'http://localhost:9200/_cluster/reroute?master_timeout=5m' \ -d '{ "commands": [ { "allocate_empty_primary": { "index": "IndexFromStep2", "shard": ShardFromStep2 , "node": "NodeFromStep2", "accept_data_loss" : true } } ] }'
Step 4: Repeat Step2 and Step3 until you see this output.
- curl -XGET 'http://localhost:9200/_shard_stores?pretty'
{ "indices" : { } }

Your cluster should go green soon.

183

answered Nov 15 '22 10:11

PraveenMak

Related questions
                            
                                ElasticSearch MapperParsingException object mapping
                            
                                How to plot a non-time based histogram in Kibana?
                            
                                Rails elasticsearch - named scope search
                            
                                Aggregation on top N results
                            
                                Automatic conversion of SQL query to ElasticSearch Query
                            
                                Kibana unwanted thousand-seperator
                            
                                Elasticsearch: splitting words on underscore; search founds nothing
                            
                                Add Timestamp to ElasticSearch with Elasticsearch-py using Bulk-API
                            
                                Spring Data Elasticsearch : @Field and non @Field all got indexed
                            
                                Including and excluding indexes in Elasticsearch query
                            
                                Fluentd Elasticsearch target index
                            
                                Querying elastic search with linq using NEST
                            
                                Getting all records from Elasticsearch using Java API
                            
                                Amazon elasticsearch interpretation of FreeStorageSpace metrics
                            
                                How to properly encrypt Elasticsearch instance with KMS
                            
                                Elasticsearch issue: Cannot connect AWS elasticsearch service
                            
                                Terminate After in elasticsearch
                            
                                NoNodeAvailableException : None of the configured nodes are available
                            
                                Log4j logging directly to elasticsearch server
                            
                                AWS Elasticsearch synonyms using file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With