I wonder how reliable the adler32 checksum is, compared to e.g. md5 checksums? It was told on wikipedia that adler32 is "much less reliable" than md5, so I wonder how much, and in which way?
More specifically, I'm wondering if it is reliable enough as a consistency check for long-time archiving of (tar) files of size 20GB+?
Adler32 is for quick hashes, has a small bit space, and simple algorithm. Its collision rate is low, but not low enough to be secure.
The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.
An Adler-32 checksum is obtained by calculating two 16-bit checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all bytes in the stream plus one, and B is the sum of the individual values of A from each step. At the beginning of an Adler-32 run, A is initialized to 1, B to 0.
To produce a checksum, you run a program that puts that file through an algorithm. Typical algorithms used for this include MD5, SHA-1, SHA-256, and SHA-512. The algorithm uses a cryptographic hash function that takes an input and produces a string (a sequence of numbers and letters) of a fixed length.
For details on the error-checking capabilities of the Adler-32 checksum, see for example Revisiting Fletcher and Adler Checksums. Maxino, 2006.
This paper contains an analysis on the Hamming distance provided by these two checksums, and provides an indication of the residual error rate for data words up to about 2^11 bits. Which, obviously is much less than your requirement of 2^38 bits...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With