Do we need to verify checksum after we move files to Hadoop (HDFS) from a Linux server through a Webhdfs ? I would like to make sure the files on the HDFS have no corruption after they are copied. But is checking checksum necessary? I read client does checksum before data is written to HDFS Can somebody help me to understand how can I make sure that source file on Linux system is same as ingested file on Hdfs using webhdfs.

I wrote a library with which you can calculate the checksum of local file, just the way hadoop does it on hdfs files. So, you can compare the checksum to cross check. https://github.com/srch07/HDFSChecksumForLocalfile

Checksum verification in Hadoop

2 Answers

I wrote a library with which you can calculate the checksum of local file, just the way hadoop does it on hdfs files.

So, you can compare the checksum to cross check. https://github.com/srch07/HDFSChecksumForLocalfile

197

answered Sep 22 '22 15:09

Abhishek Anand

Checksum for a file can be calculated using hadoop fs command.

Usage: hadoop fs -checksum URI

Returns the checksum information of a file.

Example:

hadoop fs -checksum hdfs://nn1.example.com/file1 hadoop fs -checksum file:///path/in/linux/file1

Refer : Hadoop documentation for more details

So if you want to comapre file1 in both linux and hdfs you can use above utility.

answered Sep 26 '22 15:09

Karthik

Related questions
                            
                                How to convert an 500GB SQL table into Apache Parquet?
                            
                                Hbase client ConnectionLoss for /hbase error
                            
                                Connect to S3 data from PySpark
                            
                                spark over kubernetes vs yarn/hadoop ecosystem [closed]
                            
                                how to sort word count by value in hadoop? [duplicate]
                            
                                What is the path to directory within Hadoop filesystem?
                            
                                Streaming data and Hadoop? (not Hadoop Streaming)
                            
                                Output a list from a Hadoop Map Reduce job using custom writable
                            
                                Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
                            
                                Hadoop HDFS copy with wildcards?
                            
                                Hive error: parseexception missing EOF
                            
                                Default number of reducers
                            
                                A starting point for learning how to implement MapReduce/Hadoop in Python?
                            
                                reuse JVM in Hadoop mapreduce jobs
                            
                                Why the sshd service is unrecognized?
                            
                                Is there any official Docker images for Hadoop?
                            
                                Can i point multiple location to same hive external table?
                            
                                HBase Error - assignment of -ROOT- failure
                            
                                Hadoop: how to access (many) photo images to be processed by map/reduce?
                            
                                To change replication factor of a directory in hadoop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Checksum verification in Hadoop

Tags:

checksum

hadoop

hdfs

chhaya vishwakarma

People also ask

2 Answers

Abhishek Anand

Karthik

Recent Activity

Donate For Us