Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How likely are two blocks of data likely to produce the same CRC64 value?

Tags:

crc

crc64

I have an caching application that uses a CRC64 value to ensure data integrity. I'm thinking about putting an extra field, a timestamp to be passed around with the data between the various cache servers and compared to see if data has changed.

However, this requires protocol changes. While that's not a huge deal, I already have a CRC64 that could be used as an indicator that something has changed.

Does anyone know the stats around two blocks of data producing the same CRC64? If not, how could I compute it or estimate it's likelyhood?

like image 624
hookenz Avatar asked May 17 '11 01:05

hookenz


2 Answers

If you assume that crc64 is 'perfect', then the numbers are pretty reasonable:

For a 1% probability of collision, you need 6.1 × 10^8 entries. For a 50% probability of collision, you need 5.1 × 10^9 entries.

Of course, if the data is potentially supplied by malicious sources, then collisions in a hash as simple as crc64 can be generated easily, and collisions could be rampant. So whether or not you go this route depends on the source of input data and the potential ramifications of collisions.

like image 70
sarnold Avatar answered Sep 21 '22 14:09

sarnold


The probability of any two given blocks colliding is 1/264, or 1 in about 1.8 × 1019.

However, the probability rapidly becomes more likely if you are interested in the rate of collision out of any two blocks from a population of size N.

For more information, see Birthday Problem on Wikipedia, which has formulas and approximations.

like image 28
Greg Hewgill Avatar answered Sep 18 '22 14:09

Greg Hewgill