Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Probability of collision when using a 32 bit hash

I have a 10 character string key field in a database. I've used CRC32 to hash this field but I'm worry about duplicates. Could somebody show me the probability of collision in this situation?

p.s. my string field is unique in the database. If the number of string fields is 1 million, what is probability of collision ?

like image 648
nguyenngoc101 Avatar asked Jan 08 '13 07:01

nguyenngoc101


People also ask

What is probability of collision of n bit hash?

The birthday paradox specifically says that once you've seen roughly √(2k) items, there's a 50% chance of a collision, where k is the number of distinct possible outputs. In the case where the hash function hashes to an n-bit output, this means that you'll need roughly 2n/2 hashes before you get a collision.

How do you find the probability of a hash collision?

If we hash M values and total possible hash values is T, then the expected number of collisions will be C = M * (M-1) / 2T.

How many collisions are possible in a hash function?

The maximum number of collisions is equal to the number of items you hash. All items will be hashed to key 3.

How likely is a collision in SHA256?

SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60. As you can see, the slower and longer the hash is, the more reliable it is.


1 Answers

Duplicate of Expected collisions for perfect 32bit crc

The answer referenced this article: http://arstechnica.com/civis/viewtopic.php?f=20&t=149670

Found the image below from: http://preshing.com/20110504/hash-collision-probabilities

enter image description here

like image 117
Adam Morris Avatar answered Sep 20 '22 19:09

Adam Morris