Scenario 1:
The HDFS fsimage and editlog is written into multiple places including a NFS mount.
A) NameNode Daemon Crash : Solution: Just restart the Namenode process
B) Host is Down where the Name Node is running.
Solution:
Note - We may miss the edit that might have happened after the last checkpoint.
Scenario 2:
The HDFS fsimage is written into a single directory.
A ) NameNode Daemon Crash: Solution : Unknown
B ) Host is down where the Name Node is running.
Solution:
This way we would miss again the files edited after last checkpoint.
Please let me know if this is how we can manually recover the cluster.
In production, you should run the NameNodes in HA mode with a quorum of journalling nodes, or a shared HA-NFS storage for the edit log transaction files. If you do not want or use HA, you need to run the NN with at least two storage directories for both images and edit logs, with preferably one as a soft-mounted NFS mount point for automatic off-machine persistence of the name-system.
If you have just one storage directory and no HA configuration, then the best you can get is a past-period checkpoint - if you lose all the files. In case you didn't lose files, you can try a hadoop namenode -recover
option as illustrated by this post to be able to recover the image plus some (or all) edits.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With