I have an caching application that uses a CRC64 value to ensure data integrity. I'm thinking about putting an extra field, a timestamp to be passed around with the data between the various cache servers and compared to see if data has changed.
However, this requires protocol changes. While that's not a huge deal, I already have a CRC64 that could be used as an indicator that something has changed.
Does anyone know the stats around two blocks of data producing the same CRC64? If not, how could I compute it or estimate it's likelyhood?
If you assume that crc64 is 'perfect', then the numbers are pretty reasonable:
For a 1% probability of collision, you need 6.1 × 10^8 entries. For a 50% probability of collision, you need 5.1 × 10^9 entries.
Of course, if the data is potentially supplied by malicious sources, then collisions in a hash as simple as crc64 can be generated easily, and collisions could be rampant. So whether or not you go this route depends on the source of input data and the potential ramifications of collisions.
The probability of any two given blocks colliding is 1/264, or 1 in about 1.8 × 1019.
However, the probability rapidly becomes more likely if you are interested in the rate of collision out of any two blocks from a population of size N.
For more information, see Birthday Problem on Wikipedia, which has formulas and approximations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With