HDFS File Checksum

Tags:

I am trying to check the consistency of a file after copying to HDFS using Hadoop API - DFSCleint.getFileChecksum().

I am getting the following output for the above code:

Null
HDFS : null
Local : null

Can anyone point out the error or mistake? Here is the Code :

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileChecksum;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;


public class fileCheckSum {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();

         FileSystem hadoopFS = FileSystem.get(conf);
    //  Path hdfsPath = new Path("/derby.log");

        LocalFileSystem localFS = LocalFileSystem.getLocal(conf);
    //      Path localPath = new Path("file:///home/ubuntu/derby.log");


    //  System.out.println("HDFS PATH : "+hdfsPath.getName());
    //      System.out.println("Local PATH : "+localPath.getName());

        FileChecksum hdfsChecksum = hadoopFS.getFileChecksum(new Path("/derby.log"));
        FileChecksum localChecksum = localFS.getFileChecksum(new Path("file:///home/ubuntu/derby.log"));


        if(null!=hdfsChecksum || null!=localChecksum){
            System.out.println("HDFS Checksum : "+hdfsChecksum.toString()+"\t"+hdfsChecksum.getLength());
            System.out.println("Local Checksum : "+localChecksum.toString()+"\t"+localChecksum.getLength());

            if(hdfsChecksum.toString().equals(localChecksum.toString())){
                System.out.println("Equal");
            }else{
                System.out.println("UnEqual");

            }
        }else{
            System.out.println("Null");
            System.out.println("HDFS : "+hdfsChecksum);
            System.out.println("Local : "+localChecksum);

        }

    }

}

998

asked Jan 28 '13 13:01

pradeep

1 Answers

Since you aren't setting a remote address on the conf and essentially using the same configuration, both hadoopFS and localFS are pointing to an instance of LocalFileSystem.

getFileChecksum isn't implemented for LocalFileSystem and returns null. It should be working for DistributedFileSystem though, which if your conf is pointing to a distributed cluster, FileSystem.get(conf) should return an instance of DistributedFileSystem that returns an MD5 of MD5 of CRC32 checksums of chunks of size bytes.per.checksum. This value depends on the block size and the cluster-wide config, bytes.per.checksum. That's why these two params are also encoded in the return value of the distributed checksum as the name of the algorithm: MD5-of-xxxMD5-of-yyyCRC32 where xxx is number of CRC checksums per block and yyy is the bytes.per.checksum parameter.

The getFileChecksum isn't designed to be comparable across filesystems. Although it's possible to simulate the distributed checksum locally, or hand-craft map-reduce jobs to calculate equivalents of local hashes, I suggest relying Hadoop's own integrity checks that happens when a files gets written to or read from Hadoop

answered Sep 23 '22 23:09

omid

Related questions
                            
                                Error decrypting in java
                            
                                JButton in JList cell is not clickable
                            
                                Cyclic inheritance uses
                            
                                Jackson 2.1 polymorphic deserialization: How to populate type field on pojo?
                            
                                How to use "Infer Generic Type Arguments..." in Eclipse
                            
                                import class file in java [closed]
                            
                                JavaFX equivalent of Swing's JSpinner?
                            
                                How can I find synonyms in estimated frequency order using JWNL(Wordnet Library)?
                            
                                Java xml self-closing tags
                            
                                What is the correct way to parse variables using JavaParser?
                            
                                StAX XML all content between two required tags
                            
                                finding coordinate of sketch image (eg. scanned as photo format) in r or other software
                            
                                Enable text highlighting in swing message-box
                            
                                Post on Facebook Fan page through app
                            
                                Libgdx game logic in Render?
                            
                                Video encode from sequence of images from java android [duplicate]
                            
                                How do I determine the side effects of a Java function?
                            
                                Date to XMLGregorianCalendar with specific format
                            
                                What is Reflection property of a programming language?
                            
                                Best approach to wrap an existing Java library in Ruby (JRuby)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HDFS File Checksum

Tags:

java

checksum

hadoop

mapreduce

hdfs

pradeep

People also ask

1 Answers

omid

Recent Activity

Donate For Us