lets say, I have a 64-core server, and I need to compute md5sum
of all files in /mnt/data
, and store the results in a text file:
find /mnt/data -type f -exec md5sum {} \; > md5.txt
The problem with the above command is, that only one process runs at any given time. I would like to harness the full power of my 64-cores. Ideally, I would like to makes sure, that at any given time, 64 parallel md5
processes are running (but not more than 64).
Also. I would need output from all the processes to be stored into one file.
NOTE: I am not looking for a way to compute md5sum
of one file in parallel. I am looking for a way to compute 64 md5sums of 64 different files in parallel, as long as there are any files coming from find
.
Now that we can get a list with all of our files, our next steps are: Run the md5sum command on every file in that list. Create a string that contains the list of file paths along with their hashes. And finally, run md5sum on this string we just created to obtain a single hash value.
Yes. There are an infinite number of binary files, but only a finite number of md5 hashes (since they have fixed size) hence there are infinitely many files that have the same hash.
In checksum spoofing an adversary modifies the message body and then modifies the corresponding checksum so that the recipient's checksum calculation will match the checksum (created by the adversary) in the message. This would prevent the recipient from realizing that a change occurred.
Checksums are calculated for files. Calculating the checksum for a directory requires recursively calculating the checksums for all the files in the directory. The -r option allows md5deep to recurse into sub-directories. The -l option enables displaying the relative path, instead of the default absolute path.
Use GNU parallel
. And you can find some more examples on how to implement it here.
find /mnt/data -type f | parallel -j 64 md5sum > md5.txt
You can use xargs as well, It might be more available than parallels on some distro.
-P controls the number of process spawned.
find /mnt/data -type f | xargs -L1 -P24 md5sum > /tmp/result.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With