I use Hadoop hadoop-2.0.0-mr1-cdh4.1.2 in a cluster of 40 machines. Each machine has 12 disks used by hadoop. Some disks in one machine were unbalanced, and I decided to re-balance manually as mentioned in this post: rebalance individual datanode in hadoop I stopped the DataNode on that server, moved block file pairs, moved whole sub-directories between some of the disks.
As soon as I stopped the DataNode, the NameNode complained about missing blocks by displaying the following message in the UI: WARNING : There are 2002 missing blocks. Please check the logs or run fsck in order to identify the missing blocks.
Then, I tried to restart the DataNode. It refuses to successfully start and it keeps logging errors and warnings such as follows:
java.io.IOException: Invalid directory or I/O error occurred for dir: /data/disk3/dfs/data/current/BP-208475052-10.165.18.36-1351280731538/current/finalized/subdir61/subdir28
2013-12-20 01:40:29,046 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.io.IOException: block pool BP-208475052-10.165.18.36-1351280731538 is not found
2013-12-20 01:40:29,088 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService for Block pool BP-208475052-10.165.18.36-1351280731538 (storage id DS-737580588-10.165.18.36-50010-1351280778276) service to aspen8hdp19.turner.com/10.165.18.56:54310 java.lang.NullPointerException
2013-12-20 01:40:34,088 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in offerService java.io.IOException: block pool BP-208475052-10.165.18.36-1351280731538 is not found
So, I have some questions:
I appreciate your help. Eduardo.
I'm going to answer my own question here.
The problem I had was caused by having the wrong file/dir permissions and ownership after I moved the data blocks. I did the move as root and moved files ended up with the following permissions:
drwx-----T 2 root root 12288 Dec 19 23:14 subdir28
Once I changed it back to the original, the DN restarted properly and the NN stopped reporting missing blocks or corrupt files. Here's the permissions that it should have:
drwxr-xr-t 2 hdfs hadoop 12288 Dec 20 11:47 subdir28
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With