Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HDFS says file is still open, but process writing to it was killed

Tags:

hadoop

hdfs

I'm new to hadoop and I've spent the past couple hours trying to google this issue, but I couldn't find anything that helped. My problem is HDFS says the file is still open, even though the process writing to it is long dead. This makes it impossible to read from the file.

I ran fsck on the directory and it reports everything is healthy. However when I run "hadoop fsck -fs hdfs://hadoop /logs/raw/directory_containing_file -openforwrite" I get

Status: CORRUPT
 Total size:    222506775716 B
 Total dirs:    0
 Total files:   630
 Total blocks (validated):  3642 (avg. block size 61094666 B)
  ********************************
  CORRUPT FILES:    1
  MISSING BLOCKS:   1
  MISSING SIZE:     30366208 B
  ********************************
 Minimally replicated blocks:   3641 (99.97254 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    2
 Average block replication: 2.9991763
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      23
 Number of racks:       1

Doing the fsck command again on the file that is openforwrite I get

.Status: HEALTHY
 Total size:    793208051 B
 Total dirs:    0
 Total files:   1
 Total blocks (validated):  12 (avg. block size 66100670 B)
 Minimally replicated blocks:   12 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks:     0 (0.0 %)
 Default replication factor:    2
 Average block replication: 3.0
 Corrupt blocks:        0
 Missing replicas:      0 (0.0 %)
 Number of data-nodes:      23
 Number of racks:       1

Does anyone have any ideas what is going on and how I can fix it?

like image 550
jwegan Avatar asked Mar 18 '11 02:03

jwegan


1 Answers

I figured out the blocks seem to be missing because the namenode server was temporarily unavailable, thus corrupting the filesystem for that file. It appeared the part of the file without the missing blocks could still be read/copied. Some more information on dealing with corruption in hdfs is available at https://twiki.grid.iu.edu/bin/view/Storage/HadoopRecovery (mirror: http://www.webcitation.org/5xMTitU0r)

Edit: It seems this issue was due to an issue with Scribe (or more specifically the DFSClient used by Scribe) hanging when trying to write to HDFS. We manually patched the source of our hadoop cluster with HADOOP-6099 and HDFS-278, rebuilt the binaries and restarted the cluster with the new version. There have been no issues in the two months we have been running with the new version.

like image 138
jwegan Avatar answered Oct 03 '22 07:10

jwegan