I've seen 8-bit, 16-bit, and 32-bit CRCs.
At what point do I need to jump to a wider CRC?
My gut reaction is that it is based on the data length:
EDIT: Looking at the Wikipedia page about CRC and Lott's answer, here' what we have:
<64 bytes: 8-bit CRC
<16K bytes: 16-bit CRC
<512M bytes: 32-bit CRC
Cyclic Redundancy Check (CRC) - CRCs are similar in concept to checksums, but they use polynomial division to determine the value of the CRC, which is usually 16 or 32 bits in length. The good thing about CRC is that it is very accurate.
Using CRC-8, multiple messages longer than 64 bytes will have the same CRC checksum value. This is accurate and helpful if the CRC is being used to detect changes to a file. However, if it's being used as a digest to detect duplicates among files, then it's more complicated.
The most commonly used polynomial lengths are 9 bits (CRC-8), 17 bits (CRC-16), 33 bits (CRC-32), and 65 bits (CRC-64). A CRC is called an n-bit CRC when its check value is n-bits. For a given n, multiple CRCs are possible, each with a different polynomial.
It's not a research topic. It's really well understood: http://en.wikipedia.org/wiki/Cyclic_redundancy_check
The math is pretty simple. An 8-bit CRC boils all messages down to one of 256 values. If your message is more than a few bytes long, the possibility of multiple messages having the same hash value goes up higher and higher.
A 16-bit CRC, similarly, gives you one of the 65,536 available hash values. What are the odds of any two messages having one of these values?
A 32-bit CRC gives you about 4 billion available hash values.
From the wikipedia article: "maximal total blocklength is equal to 2**r − 1
". That's in bits. You don't need to do much research to see that 2**9 - 1
is 511 bits. Using CRC-8, multiple messages longer than 64 bytes will have the same CRC checksum value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With