When calculating the md5 sum of large files, I see a single cpu core jump to 100% for however long it takes, leaving all other cores idle.
My rudimentary understanding of md5 is the entire process is completely linear, where values are dependent on all previous values read, and there is nothing we can do to make it multi-threaded. Is this true?
Or is there a way to break the files into sections, calculate <something> over multiple parts using multi-cores, and then combine those <something> values into the final md5?
The library we're using to calculate the md5sum is http://libmd5-rfc.sourceforge.net/ but I'd switch to a different one if it was possible to break the md5sum across multiple cores so it completes faster.
(Note: changing to something other than md5 is not the question, nor can it be done because of the other closed systems to which this interfaces. Nor is this question about the safety of using md5.)
No you cannot break it apart at the file level. MD5 maintains a state as it runs through the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With