Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare HDFS Checksum to Local File System Checksum

Tags:

md5

checksum

hdfs

I am trying to write a simple script to verify the HDFS and local filesystem checksums.

On HDFS i get -

[m@x01tbipapp3a ~]$ hadoop fs -checksum /user/m/file.txt
/user/m/file.txt  MD5-of-0MD5-of-512CRC32C        **000002000000000000000000755ca25bd89d1a2d64990a68dedb5514**

On the Local File System, I get -

[m@x01tbipapp3a ~]$ cksum file.txt
**3802590149 26276247** file.txt
[m@x01tbipapp3a ~]$ md5sum file.txt
**c1aae0db584d72402d5bcf5cbc29134c**  file.txt

Now how do i compare them. I tried to convert the HDFS checksum from Hex to Decimal to see if it matches the chksum but it does not...

Is there a way to compare the 2 checksums using any algorithm?

thanks

like image 585
myloginid Avatar asked May 27 '15 03:05

myloginid


People also ask

What is checksum in HDFS?

A checksum is a small-sized datum derived from a block of digital data for the purpose of detecting errors. HDFS calculates/computes checksums for each data block and eventually stores them in a separate hidden file in the same HDFS namespace.

How do I find the checksum value of a file?

Open a terminal window. Type the following command: md5sum [type file name with extension here] [path of the file] -- NOTE: You can also drag the file to the terminal window instead of typing the full path. Hit the Enter key. You'll see the MD5 sum of the file.

How are file systems checked in HDFS?

HDFS fsck is used to check the health of the file system, to find missing files, over replicated, under replicated and corrupted blocks.


2 Answers

This is not a solution but a workaround which can be used. Local File Checksum: cksum test.txt

HDFS Checksum : cksum hadoop fs -cat /user/test/test.txt > tmp.txt tmp.txt

You can compare them.

Hope it helps.

like image 182
jeetendra rawal Avatar answered Oct 03 '22 14:10

jeetendra rawal


I was also confused because the md5 was not matching,turned out Hadoop checksum is not a simple md5, its a MD5 of MD5 of CRC32C :-)

see this

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201508.mbox/%3CCAMm20=5K+f3ArVtoo9qMSesjgd_opdcvnGiDTkd3jpn7SHkysg@mail.gmail.com%3E

and this

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201103.mbox/%[email protected]%3E

like image 33
r2d2 Avatar answered Oct 03 '22 14:10

r2d2