Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does decomissioning a node remove data from that node?

Tags:

hadoop

In Hadoop, if I decommission a node Hadoop will redistribute the files across the cluster so they are properly replicated. Will the data be deleted from the decomissioned node?

I am trying to balance the data across the disks on a particular node. I plan to do this by decomissioning the node and then recomissioning the node. Do I need to delete the data from that node after decomissioning is complete, or will it be enough to simply recomission it (remove it from the excludes file and run hadoop dfsadmin -refreshNodes)?

UPDATE: It worked for me to decomission a node, delete all the data on that node, and then recomission it.

like image 426
schmmd Avatar asked Nov 12 '22 01:11

schmmd


1 Answers

AFAIK, data is not removed from a DataNode when you decommission it. Further writes on that DataNode will not be possible though. When you decommission a DataNode, the replicas held by that DataNode are marked as "decommissioned" replicas, which are still eligible for read access.

But why do you want to perform this decomissioning/recomissioning cycle?Why don't you just specify all the disks as a comma separated value to the dfs.data.dir property in your hdfs-site.xml and restart the DataNode daemon. Run the balancer after the restart.

like image 85
Tariq Avatar answered Nov 15 '22 07:11

Tariq