Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Namenode format does not free up datanode disk space

Tags:

hadoop

hdfs

After shutting down the cluster ./stop-all.sh, and then invoking a hadoop namenode -format, I see that the datanodes have the same disk space i.e. the space has not been freed up.

Why is that?

like image 843
Abbas Gadhia Avatar asked Feb 15 '23 14:02

Abbas Gadhia


1 Answers

You can delete manually data on DataNode before formatting NameNode

rmr

Usage: hadoop fs -rmr URI [URI …]

Recursive version of delete. Example:

hadoop fs -rmr /user/hadoop/dir
hadoop fs -rmr hdfs://nn.example.com/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.


Alternatively

Data-nodes should be reformatted whenever the name-node is. I see 2 approaches here:

  1. In order to reformat the cluster we call "start-dfs -format" or make a special script "format-dfs". This would format the cluster components all together. The question is whether it should start the cluster after formatting?
  2. Format the name-node only. When data-nodes connect to the name-node it will tell them to format their storage directories if it sees that the namespace is empty and its cTime=0. The drawback of this approach is that we can loose blocks of a data-node from another cluster if it connects by mistake to the empty name-node.

https://issues.apache.org/jira/browse/HDFS-107

like image 179
user2486495 Avatar answered Feb 25 '23 15:02

user2486495