I am looking for a checksum algorithm where for a large block of data the checksum is equal to the sum of checksums from all the smaller component blocks. Most of what I have found is from RFCs 1624/1141 which do provide this functionality. Does anyone have any experience with these checksumming techniques or a similar one?
A file will have different MD5, SHA-1, and SHA–256 checksums. If you only know the MD5 sum of an original file, you must calculate your copy's MD5 sum to check if it's a match.
A checksum is a value that represents the number of bits in a transmission message and is used by IT professionals to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function.
To calculate the checksum of an API frame: Add all bytes of the packet, except the start delimiter 0x7E and the length (the second and third bytes). Keep only the lowest 8 bits from the result. Subtract this quantity from 0xFF.
The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.
If it's just a matter of quickly combining the checksums of the smaller blocks to get to the checksums of the larger message (not necessarily by a plain summation) you can do this with a CRC-type (or similar) algorithm.
The CRC-32 algorithm is as simple as this:
uint32_t update(uint32_t state, unsigned bit)
{
if (((state >> 31) ^ bit) & 1) state = (state << 1) ^ 0x04C11DB7;
else state = (state << 1);
return state;
}
Mathematically, the state represents a polynomial over the field GF2 that is always reduced modulo the generator polynomial. Given a new bit b
the old state is transformed into the new state like this
state --> (state * x^1 + b * x^32) mod G
where G is the generator polynomial and addition is done in GF2 (xor). This checksum is linear in the sense that you can write the message M
as a sum (xor) of messages A,B,C,... like this
10110010 00000000 00000000 = A = a 00000000 00000000
00000000 10010001 00000000 = B = 00000000 b 00000000
00000000 00000000 11000101 = C = 00000000 00000000 c
-------------------------------------------------------------
= 10110010 10010001 11000101 = M = a b c
with the following properties
M = A + B + C
checksum(M) = checksum(A) + checksum(B) + checksum(C)
Again, I mean the +
in GF2 which you can implement with a binary XOR.
Finally, it's possible to compute checksum(B)
based on checksum(b)
and the position of the subblock b
relative to B
. The simple part is leading zeros. Leading zeros don't affect the checksum at all. So checksum(0000xxxx)
is the same as checksum(xxxx)
. If you want to compute the checksum of a zero-padded (to the right -> trailing zeros) message given the checksum of the non-padded message it is a bit more complicated. But not that complicated:
zero_pad(old_check_sum, number_of_zeros)
:= ( old_check_sum * x^{number_of_zeros} ) mod G
= ( old_check_sum * (x^{number_of_zeros} mod G) ) mod G
So, getting the checksum of a zero-padded message is just a matter of multiplying the "checksum polynomial" of the non-padded message with some other polynomial (x^{number_of_zeros} mod G
) that only depends on the number of zeros you want to add. You could precompute this in a table or use the square-and-multiply algorithm to quickly compute this power.
Suggested reading: Painless Guide to CRC Error Detection Algorithms
I have only used Adler/Fletcher checksums which work as you describe.
There is a nice comparison of crypto++ hash/checksum implementations here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With