Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incremental Checksums

Tags:

checksum

I am looking for a checksum algorithm where for a large block of data the checksum is equal to the sum of checksums from all the smaller component blocks. Most of what I have found is from RFCs 1624/1141 which do provide this functionality. Does anyone have any experience with these checksumming techniques or a similar one?

like image 759
Steve Severance Avatar asked Jul 23 '09 18:07

Steve Severance


People also ask

What are the different types of checksums?

A file will have different MD5, SHA-1, and SHA–256 checksums. If you only know the MD5 sum of an original file, you must calculate your copy's MD5 sum to check if it's a match.

What a checksum means?

A checksum is a value that represents the number of bits in a transmission message and is used by IT professionals to detect high-level errors within data transmissions. Prior to transmission, every piece of data or file can be assigned a checksum value after running a cryptographic hash function.

How checksums are calculated?

To calculate the checksum of an API frame: Add all bytes of the packet, except the start delimiter 0x7E and the length (the second and third bytes). Keep only the lowest 8 bits from the result. Subtract this quantity from 0xFF.

Which checksum is best?

The SHA family of algorithms is published by the National Institute of Standards and Technology. One algorithm, SHA-1, produces a 160-bit checksum and is the best-performing checksum, followed by the 256-bit and 512-bit versions.


2 Answers

If it's just a matter of quickly combining the checksums of the smaller blocks to get to the checksums of the larger message (not necessarily by a plain summation) you can do this with a CRC-type (or similar) algorithm.

The CRC-32 algorithm is as simple as this:

uint32_t update(uint32_t state, unsigned bit)
{
    if (((state >> 31) ^ bit) & 1) state = (state << 1) ^ 0x04C11DB7;
    else                           state = (state << 1);
    return state;
}

Mathematically, the state represents a polynomial over the field GF2 that is always reduced modulo the generator polynomial. Given a new bit b the old state is transformed into the new state like this

state --> (state * x^1 + b * x^32) mod G

where G is the generator polynomial and addition is done in GF2 (xor). This checksum is linear in the sense that you can write the message M as a sum (xor) of messages A,B,C,... like this

  10110010 00000000 00000000 = A =    a     00000000 00000000
  00000000 10010001 00000000 = B = 00000000    b     00000000
  00000000 00000000 11000101 = C = 00000000 00000000    c
-------------------------------------------------------------
= 10110010 10010001 11000101 = M =    a        b        c

with the following properties

         M  =          A  +          B  +          C
checksum(M) = checksum(A) + checksum(B) + checksum(C)

Again, I mean the + in GF2 which you can implement with a binary XOR.

Finally, it's possible to compute checksum(B) based on checksum(b) and the position of the subblock b relative to B. The simple part is leading zeros. Leading zeros don't affect the checksum at all. So checksum(0000xxxx) is the same as checksum(xxxx). If you want to compute the checksum of a zero-padded (to the right -> trailing zeros) message given the checksum of the non-padded message it is a bit more complicated. But not that complicated:

zero_pad(old_check_sum, number_of_zeros)
  := ( old_check_sum *  x^{number_of_zeros}        ) mod G
   = ( old_check_sum * (x^{number_of_zeros} mod G) ) mod G

So, getting the checksum of a zero-padded message is just a matter of multiplying the "checksum polynomial" of the non-padded message with some other polynomial (x^{number_of_zeros} mod G) that only depends on the number of zeros you want to add. You could precompute this in a table or use the square-and-multiply algorithm to quickly compute this power.

Suggested reading: Painless Guide to CRC Error Detection Algorithms

like image 141
sellibitze Avatar answered Nov 09 '22 06:11

sellibitze


I have only used Adler/Fletcher checksums which work as you describe.

There is a nice comparison of crypto++ hash/checksum implementations here.

like image 41
Rob Elliott Avatar answered Nov 09 '22 06:11

Rob Elliott