Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare checksum of files between two servers and report mismatch

I have to compare checksum of all files in /primary and /secondary folders in machineA with files in this folder /bat/snap/ which is in remote server machineB. The remote server will have lots of files along with the files we have in machineA.

  • If there is any mismatch in checksum then I want to report all those files that have issues in machineA with full path and exit with non zero status code.
  • If everything is matching then exit zero.

I wrote one command (not sure whether there is any better way to write it) that I am running on machineA but its very slow. Is there any way to make it faster?

(cd /primary && find . -type f -exec md5sum {} +; cd /secondary && find . -type f -exec md5sum {} +) | ssh machineB '(cd /bat/snap/ && md5sum -c)'

Also it prints out file name like this ./abc_monthly_1536_proc_7.data: OK. Is there any way by which it can print out full path name of that file on machineA?

ssh to remote host for every file definitely isn't very efficient. parallel could speed it up by doing it concurrently for more files, but the more efficient way is likely to tweak the command a bit so it does ssh to machineB and gets all the md5sum in one shot. Is this possible to do?

like image 918
user1950349 Avatar asked Apr 27 '18 22:04

user1950349


People also ask

What can cause a checksum mismatch?

A Checksum mismatch error is caused by many factors, such as defective drives, faulty memory modules, or non-Synology memory module installation. Perform regular S.M.A.R.T. tests to monitor the health status of your drives and identify drive-related issues at an early stage.

What happens if the checksum doesnt match?

If the checksum matches, the files are identical. If not, there's a problem—perhaps the file is corrupted, or you're just comparing two different files.

Are file checksums unique?

How does checksum work? A file is pushed through an algorithm, which outputs a unique alphanumeric string called a checksum, also known as a "hash". Different files, even those with minute differences, produce different checksum values.


1 Answers

If your primary goal is not to count the checksums but list differences, perhaps faster (and easier) way would be to run rsync with --dry-run option. If any files listed, they differs, for example:

MBP:~ jhartman$ rsync -avr --dry-run rsync-test 192.168.1.100:/tmp/; echo $?
building file list ... done
rsync-test/file1.txt

sent 172 bytes  received 26 bytes  396.00 bytes/sec
total size is 90  speedup is 0.45

Of course, because of --dry-run no files changed on the target.

I hope it will help, Jarek

like image 77
Jarek Avatar answered Sep 30 '22 16:09

Jarek