EDIT: I finally figured out what the issue was. Some files had very high replication factor set, and I was reducing my cluster to 2 nodes. Once I reduced my replication factor on those files, the decommissioning successfully ended quickly.
I've added the node to be decommissioned in the dfs.hosts.exclude
and mapred.hosts.exclude
files, and executed this command:
bin/hadoop dfsadmin -refreshNodes
.
In the NameNode UI, I see this node under Decommissioning Nodes
, but it's taking too long time, and I don't have much data on the node being decommissioned.
Does it always take a very long time to decommision nodes or is there some place I should be looking? I'm not sure what is exactly going on.
I don't see any corrupted blocks also on this node:
$ ./hadoop/bin/hadoop fsck -blocks /
Total size: 157254687 B
Total dirs: 201
Total files: 189 (Files currently being written: 6)
Total blocks (validated): 140 (avg. block size 1123247 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 140 (100.0 %)
Over-replicated blocks: 6 (4.285714 %)
Under-replicated blocks: 12 (8.571428 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.9714285
Corrupt blocks: 0
Missing replicas: 88 (31.884058 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Jul 22 14:42:45 IST 2013 in 33 milliseconds
The filesystem under path '/' is HEALTHY
$ ./hadoop/bin/hadoop dfsadmin -report
Configured Capacity: 25357025280 (23.62 GB)
Present Capacity: 19756299789 (18.4 GB)
DFS Remaining: 19366707200 (18.04 GB)
DFS Used: 389592589 (371.54 MB)
DFS Used%: 1.97%
Under replicated blocks: 14
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)
Name: 10.40.11.107:50010
Decommission Status : Decommission in progress
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 54947840 (52.4 MB)
Non DFS Used: 1786830848 (1.66 GB)
DFS Remaining: 6610563072(6.16 GB)
DFS Used%: 0.65%
DFS Remaining%: 78.21%
Last contact: Mon Jul 22 14:29:37 IST 2013
Name: 10.40.11.106:50010
Decommission Status : Normal
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 167412428 (159.66 MB)
Non DFS Used: 1953377588 (1.82 GB)
DFS Remaining: 6331551744(5.9 GB)
DFS Used%: 1.98%
DFS Remaining%: 74.91%
Last contact: Mon Jul 22 14:29:37 IST 2013
Name: 10.40.11.108:50010
Decommission Status : Normal
Configured Capacity: 8452341760 (7.87 GB)
DFS Used: 167232321 (159.49 MB)
Non DFS Used: 1860517055 (1.73 GB)
DFS Remaining: 6424592384(5.98 GB)
DFS Used%: 1.98%
DFS Remaining%: 76.01%
Last contact: Mon Jul 22 14:29:38 IST 2013
Decommissioning hostNavigate to the Hosts screen from the Cloudera Manager toolbar. 2. Select the host to be decommissioned (node04) from the list of hosts and click the option “Hosts Decommission” from the drop-down “Actions for selected“. 3.
DECOMMISSION NODE (Decommission an application or system) Use this command to remove an application or system client node from the production environment. Any backup data that is stored for the client node expires according to policy settings unless you explicitly delete the data.
If Namenode gets down then the whole Hadoop cluster is inaccessible and considered dead. Datanode stores actual data and works as instructed by Namenode. A Hadoop file system can have multiple data nodes but only one active Namenode.
Decommissioning is not an instant process, even if you don't have much data.
First, when you decommission that means that the data has to be replicated quite a few blocks (depends on how large your block size is), and this could easily overwhelm your cluster and cause operational issues, so I believe this is somewhat throttled.
Also, depending on which Hadoop version you use, the thread that monitors decomissions only wakes up every so often. It used to be around 5 minutes in the earlier versions of Hadoop, but I believe now this is every minute or less.
Decommission in progress means that the blocks are being replicated, so I guess this really depends how much data you have, and you just have to wait since this won't be utilizing your cluster fully for this task.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With