Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there circumstances where a hash algorithm can be guaranteed unique?

If I'm hashing size-constrained similar data (social security numbers, for example) using a hash algorithm with a larger byte size than the data (sha-256, for example), will the hash guarantee the same level of uniqueness as the original data?

like image 484
matt Avatar asked Feb 19 '10 21:02

matt


1 Answers

The probability of a hash collision has nothing to do with the size of the input string (except to the extent that it indicates how many inputs you need to keep uniqueness among). It's possible to have a hash collision when you hash 0 and 1 using a perfect hash algorithm, although the possibility is 1/(2^bit-length). Which in the case of SHA-256 is effectively zero.

Hash collisions are a birthday paradox problem. In the case of a 256 bit hash, the probability of a collision among two inputs is purely dependent on the count of inputs and is:

  • 1 - (2^256)! / ((2^256^inputcount) * (2^256-inputcount)!) or as others have said -- basically zero for reasonable numbers of inputs.
like image 147
Michael Mullany Avatar answered Oct 09 '22 10:10

Michael Mullany