I have a 10 character string key field in a database. I've used CRC32 to hash this field but I'm worry about duplicates. Could somebody show me the probability of collision in this situation?
p.s. my string field is unique in the database. If the number of string fields is 1 million, what is probability of collision ?
The birthday paradox specifically says that once you've seen roughly √(2k) items, there's a 50% chance of a collision, where k is the number of distinct possible outputs. In the case where the hash function hashes to an n-bit output, this means that you'll need roughly 2n/2 hashes before you get a collision.
If we hash M values and total possible hash values is T, then the expected number of collisions will be C = M * (M-1) / 2T.
The maximum number of collisions is equal to the number of items you hash. All items will be hashed to key 3.
SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60. As you can see, the slower and longer the hash is, the more reliable it is.
Duplicate of Expected collisions for perfect 32bit crc
The answer referenced this article: http://arstechnica.com/civis/viewtopic.php?f=20&t=149670
Found the image below from: http://preshing.com/20110504/hash-collision-probabilities
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With