Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checksum Exception when reading from or copying to hdfs in apache hadoop

Tags:

I am trying to implement a parallelized algorithm using Apache hadoop, however I am facing some issues when trying to transfer a file from the local file system to hdfs. A checksum exception is being thrown when trying to read from or transfer a file.

The strange thing is that some files are being successfully copied while others are not (I tried with 2 files, one is slightly bigger than the other, both are small in size though). Another observation that I have made is that the Java FileSystem.getFileChecksum method, is returning a null in all cases.

A slight background on what I am trying to achieve: I am trying to write a file to hdfs, to be able to use it as a distributed cache for the mapreduce job that I have written.

I have also tried the hadoop fs -copyFromLocal command from the terminal, and the result is the exact same behaviour as when it is done through the java code.

I have looked all over the web, including other questions here on stackoverflow however I haven't managed to solve the issue. Please be aware that I am still quite new to hadoop so any help is greatly appreciated.

I am attaching the stack trace below which shows the exceptions being thrown. (In this case I have posted the stack trace resulting from the hadoop fs -copyFromLocal command from terminal)

name@ubuntu:~/Desktop/hadoop2$ bin/hadoop fs -copyFromLocal ~/Desktop/dtlScaleData/attr.txt /tmp/hadoop-name/dfs/data/attr2.txt  13/03/15 15:02:51 INFO util.NativeCodeLoader: Loaded the native-hadoop library     13/03/15 15:02:51 INFO fs.FSInputChecker: Found checksum error: b[0, 0]=     org.apache.hadoop.fs.ChecksumException: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0         at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:219)         at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)         at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)         at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)         at java.io.DataInputStream.read(DataInputStream.java:100)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:176)         at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)         at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)         at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)         at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)     copyFromLocal: Checksum error: /home/name/Desktop/dtlScaleData/attr.txt at 0 
like image 551
lvella Avatar asked Mar 15 '13 14:03

lvella


2 Answers

You are probably hitting the bug described in HADOOP-7199. What happens is that when you download a file with copyToLocal, it also copies a crc file in the same directory, so if you modify your file and then try to do copyFromLocal, it will do a checksum of your new file and compare to your local crc file and fail with a non descriptive error message.

To fix it, please check if you have this crc file, if you do just remove it and try again.

like image 51
Charles Menguy Avatar answered Sep 22 '22 18:09

Charles Menguy


I face the same problem solved by removing .crc files

like image 41
Narendra Parmar Avatar answered Sep 26 '22 18:09

Narendra Parmar