Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Robust and fast checksum algorithm?

Which checksum algorithm can you recommend in the following use case?

I want to generate checksums of small JPEG files (~8 kB each) to check if the content changed. Using the filesystem's date modified is unfortunately not an option.
The checksum need not be cryptographically strong but it should robustly indicate changes of any size.

The second criterion is speed since it should be possible to process at least hundreds of images per second (on a modern CPU).

The calculation will be done on a server with several clients. The clients send the images over Gigabit TCP to the server. So there's no disk I/O as bottleneck.

like image 940
Benedikt Waldvogel Avatar asked Sep 23 '08 18:09

Benedikt Waldvogel


People also ask

Which checksum is fastest?

For years MD5 was the fastest and most secure checksum available. Although xxHash is becoming more widely used there are still many companies that require the MD5 checksum for data integrity.

What is the best checksum algorithm?

The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.

What is checksum algorithm?

A checksum is a technique used to determine the authenticity of received data, i.e., to detect whether there was an error in transmission. Along with the data that needs to be sent, the sender uses an algorithm to calculate the checksum of the data and sends it along.

Is a popular checksum algorithm used to detect data corruption?

MD5 is only suitable for detecting accidental data corruption and not for data security applications. Strong cryptographic hash algorithms such as SHA256 and SHA512 can be used for both data security and data safety.


1 Answers

If you have many small files, your bottleneck is going to be file I/O and probably not a checksum algorithm.

A list of hash functions (which can be thought of as a checksum) can be found here.

Is there any reason you can't use the filesystem's date modified to determine if a file has changed? That would probably be faster.

like image 80
luke Avatar answered Oct 02 '22 19:10

luke