In Hadoop, if I decommission a node Hadoop will redistribute the files across the cluster so they are properly replicated. Will the data be deleted from the decomissioned node?
I am trying to balance the data across the disks on a particular node. I plan to do this by decomissioning the node and then recomissioning the node. Do I need to delete the data from that node after decomissioning is complete, or will it be enough to simply recomission it (remove it from the excludes file and run hadoop dfsadmin -refreshNodes
)?
UPDATE: It worked for me to decomission a node, delete all the data on that node, and then recomission it.
AFAIK, data is not removed from a DataNode when you decommission it. Further writes on that DataNode will not be possible though. When you decommission a DataNode, the replicas held by that DataNode are marked as "decommissioned" replicas, which are still eligible for read access.
But why do you want to perform this decomissioning/recomissioning cycle?Why don't you just specify all the disks as a comma separated value to the dfs.data.dir property in your hdfs-site.xml and restart the DataNode daemon. Run the balancer after the restart.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With