Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix corrupt HDFS FIles

Tags:

hadoop

hdfs

How does someone fix a HDFS that's corrupt? I looked on the Apache/Hadoop website and it said its fsck command, which doesn't fix it. Hopefully someone who has run into this problem before can tell me how to fix this.

Unlike a traditional fsck utility for native file systems, this command does not correct the errors it detects. Normally NameNode automatically corrects most of the recoverable failures.

When I ran bin/hadoop fsck / -delete, it listed the files that were corrupt or missing blocks. How do I make it not corrupt? This is on a practice machine so I COULD blow everything away but when we go live, I won't be able to "fix" it by blowing everything away so I'm trying to figure it out now.

like image 322
Classified Avatar asked Oct 06 '13 03:10

Classified


People also ask

How do I fix a corrupted block in HDFS?

To identify "corrupt" or "missing" blocks, the command-line command 'hdfs fsck /path/to/file' can be used. Other tools also exist. HDFS will attempt to recover the situation automatically. By default there are three replicas of any block in the cluster.

What happens if the block on HDFS is corrupted?

A corrupted block means that HDFS cannot find a valid replica containing that block's data. Since replication factor is typically 3, and since the default replica placement logic spreads those replicas across different machines and racks, it's very unlikely to encounter corruption on typical files.

What is HDFS fsck?

HDFS fsck is used to check the health of the file system, to find missing files, over replicated, under replicated and corrupted blocks. Command for finding the block for a file: $ hdfs fsck / answered Apr 10, 2019 by Gitika.


2 Answers

You can use

  hdfs fsck / 

to determine which files are having problems. Look through the output for missing or corrupt blocks (ignore under-replicated blocks for now). This command is really verbose especially on a large HDFS filesystem so I normally get down to the meaningful output with

  hdfs fsck / | egrep -v '^\.+$' | grep -v eplica 

which ignores lines with nothing but dots and lines talking about replication.

Once you find a file that is corrupt

  hdfs fsck /path/to/corrupt/file -locations -blocks -files 

Use that output to determine where blocks might live. If the file is larger than your block size it might have multiple blocks.

You can use the reported block numbers to go around to the datanodes and the namenode logs searching for the machine or machines on which the blocks lived. Try looking for filesystem errors on those machines. Missing mount points, datanode not running, file system reformatted/reprovisioned. If you can find a problem in that way and bring the block back online that file will be healthy again.

Lather rinse and repeat until all files are healthy or you exhaust all alternatives looking for the blocks.

Once you determine what happened and you cannot recover any more blocks, just use the

  hdfs fs -rm /path/to/file/with/permanently/missing/blocks 

command to get your HDFS filesystem back to healthy so you can start tracking new errors as they occur.

like image 77
mobileAgent Avatar answered Oct 18 '22 23:10

mobileAgent


If you just want to get your HDFS back to normal state and don't worry much about the data, then

This will list the corrupt HDFS blocks:

hdfs fsck -list-corruptfileblocks

This will delete the corrupted HDFS blocks:

hdfs fsck / -delete

Note that, you might have to use sudo -u hdfs if you are not the sudo user (assuming "hdfs" is name of the sudo user)

like image 20
PradeepKumbhar Avatar answered Oct 18 '22 23:10

PradeepKumbhar