I'm new to hadoop and I've spent the past couple hours trying to google this issue, but I couldn't find anything that helped. My problem is HDFS says the file is still open, even though the process writing to it is long dead. This makes it impossible to read from the file.
I ran fsck on the directory and it reports everything is healthy. However when I run "hadoop fsck -fs hdfs://hadoop /logs/raw/directory_containing_file -openforwrite" I get
Status: CORRUPT
Total size: 222506775716 B
Total dirs: 0
Total files: 630
Total blocks (validated): 3642 (avg. block size 61094666 B)
********************************
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 30366208 B
********************************
Minimally replicated blocks: 3641 (99.97254 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.9991763
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 23
Number of racks: 1
Doing the fsck command again on the file that is openforwrite I get
.Status: HEALTHY
Total size: 793208051 B
Total dirs: 0
Total files: 1
Total blocks (validated): 12 (avg. block size 66100670 B)
Minimally replicated blocks: 12 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 23
Number of racks: 1
Does anyone have any ideas what is going on and how I can fix it?
I figured out the blocks seem to be missing because the namenode server was temporarily unavailable, thus corrupting the filesystem for that file. It appeared the part of the file without the missing blocks could still be read/copied. Some more information on dealing with corruption in hdfs is available at https://twiki.grid.iu.edu/bin/view/Storage/HadoopRecovery (mirror: http://www.webcitation.org/5xMTitU0r)
Edit: It seems this issue was due to an issue with Scribe (or more specifically the DFSClient used by Scribe) hanging when trying to write to HDFS. We manually patched the source of our hadoop cluster with HADOOP-6099 and HDFS-278, rebuilt the binaries and restarted the cluster with the new version. There have been no issues in the two months we have been running with the new version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With