Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recover Hadoop NameNode Failure

Scenario 1:

The HDFS fsimage and editlog is written into multiple places including a NFS mount.

A) NameNode Daemon Crash : Solution: Just restart the Namenode process

B) Host is Down where the Name Node is running.

Solution:

  1. Start the namenode in a different host with a empty dfs.name.dir
  2. Point the dfs.name.dir to the NFS mount where we have copy of the meta data. OR
  3. Use --importCheckpoint option while starting namenode after pointing fs.checkpoint.dir to checkpoint directory from Secondary NameNode
  4. Change the fs.default.name to the backup host name URI and restart the cluster with all the slave IP's in slaves file.

Note - We may miss the edit that might have happened after the last checkpoint.

Scenario 2:

The HDFS fsimage is written into a single directory.

A ) NameNode Daemon Crash: Solution : Unknown

B ) Host is down where the Name Node is running.

Solution:

  1. Create a blank directory pointing to dfs.name.dir to directory in (1)
  2. Start the Namenode with -importCheckpoint after pointing fs.checkpoint.dir to checkpoint directory from Secondary NameNode
  3. Change the fs.default.name to the backup host name URI and restart the cluster with all the slave IP's in slaves file.

This way we would miss again the files edited after last checkpoint.

Please let me know if this is how we can manually recover the cluster.

like image 461
Jagaran Avatar asked Nov 14 '22 08:11

Jagaran


1 Answers

In production, you should run the NameNodes in HA mode with a quorum of journalling nodes, or a shared HA-NFS storage for the edit log transaction files. If you do not want or use HA, you need to run the NN with at least two storage directories for both images and edit logs, with preferably one as a soft-mounted NFS mount point for automatic off-machine persistence of the name-system.

If you have just one storage directory and no HA configuration, then the best you can get is a past-period checkpoint - if you lose all the files. In case you didn't lose files, you can try a hadoop namenode -recover option as illustrated by this post to be able to recover the image plus some (or all) edits.

like image 112
Harsh J Avatar answered Dec 28 '22 07:12

Harsh J