Checksum Exception when reading from or copying to hdfs in apache hadoop

Question

I am trying to implement a parallelized algorithm using Apache hadoop, however I am facing some issues when trying to transfer a file from the local file system to hdfs. A checksum exception is being thrown when trying to read from or transfer a file.

The strange thing is that some files are being successfully copied while others are not (I tried with 2 files, one is slightly bigger than the other, both are small in size though). Another observation that I have made is that the Java FileSystem.getFileChecksum method, is returning a null in all cases.

A slight background on what I am trying to achieve: I am trying to write a file to hdfs, to be able to use it as a distributed cache for the mapreduce job that I have written.

I have also tried the hadoop fs -copyFromLocal command from the terminal, and the result is the exact same behaviour as when it is done through the java code.

I have looked all over the web, including other questions here on stackoverflow however I haven't managed to solve the issue. Please be aware that I am still quite new to hadoop so any help is greatly appreciated.

I am attaching the stack trace below which shows the exceptions being thrown. (In this case I have posted the stack trace resulting from the hadoop fs -copyFromLocal command from terminal)

name@ubuntu:~/Desktop/hadoop2$ bin/hadoop fs -copyFromLocal ~/Desktop/dtlScaleData/attr.txt /tmp/hadoop-name/dfs/data/attr2.txt  13/03/15 15:02:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library     13/03/15 15:02:51 INFO fs.FSInputChecker: Found checksum error: b[0, 0]=     org.apache.hadoop.fs.ChecksumException: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0         at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:219)         at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)         at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)         at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)         at java.io.DataInputStream.read(DataInputStream.java:100)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)         at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)         at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)         at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)         at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)     copyFromLocal: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0

Charles Menguy · Accepted Answer

You are probably hitting the bug described in HADOOP-7199. What happens is that when you download a file with copyToLocal, it also copies a crc file in the same directory, so if you modify your file and then try to do copyFromLocal, it will do a checksum of your new file and compare to your local crc file and fail with a non descriptive error message.

To fix it, please check if you have this crc file, if you do just remove it and try again.

Narendra Parmar · Answer

I face the same problem solved by removing .crc files

Checksum Exception when reading from or copying to hdfs in apache hadoop

Tags:

lvella

2 Answers

Charles Menguy

Narendra Parmar

Recent Activity

Donate For Us

Checksum Exception when reading from or copying to hdfs in apache hadoop

Tags:

lvella

2 Answers

Charles Menguy

Narendra Parmar

Related questions

Recent Activity

Donate For Us