Which checksum algorithm can you recommend in the following use case?
I want to generate checksums of small JPEG files (~8 kB each) to check if the content changed. Using the filesystem's date modified is unfortunately not an option.
The checksum need not be cryptographically strong but it should robustly indicate changes of any size.
The second criterion is speed since it should be possible to process at least hundreds of images per second (on a modern CPU).
The calculation will be done on a server with several clients. The clients send the images over Gigabit TCP to the server. So there's no disk I/O as bottleneck.
For years MD5 was the fastest and most secure checksum available. Although xxHash is becoming more widely used there are still many companies that require the MD5 checksum for data integrity.
The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.
A checksum is a technique used to determine the authenticity of received data, i.e., to detect whether there was an error in transmission. Along with the data that needs to be sent, the sender uses an algorithm to calculate the checksum of the data and sends it along.
MD5 is only suitable for detecting accidental data corruption and not for data security applications. Strong cryptographic hash algorithms such as SHA256 and SHA512 can be used for both data security and data safety.
If you have many small files, your bottleneck is going to be file I/O and probably not a checksum algorithm.
A list of hash functions (which can be thought of as a checksum) can be found here.
Is there any reason you can't use the filesystem's date modified to determine if a file has changed? That would probably be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With