Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash Collision - what are the chances? [closed]

I have some code on my PHP powered site that creates a random hash (using sha1()) and I use it to match records in the database.

What are the chances of a collision? Should I generate the hash, then check first if it's in the database (I'd rather avoid an extra query) or automatically insert it, based on the probability that it probably won't collide with another.

like image 230
alex Avatar asked Nov 18 '08 05:11

alex


People also ask

How likely is hash collision?

The probability of just two hashes accidentally colliding is approximately: 1.47*10-29. SHA256: The slowest, usually 60% slower than md5, and the longest generated hash (32 bytes). The probability of just two hashes accidentally colliding is approximately: 4.3*10-60.

What happens after hash collision?

If a hash collision occurs, the table will be probed to move the record to an alternate cell that is stated as empty. There are different types of probing that take place when a hash collision happens and this method is implemented. Some types of probing are linear probing, double hashing, and quadratic probing.

How are hash collisions resolved?

Hash collision is resolved by open addressing with linear probing. Since CodeMonk and Hashing are hashed to the same index i.e. 2, store Hashing at 3 as the interval between successive probes is 1. There are no more than 20 elements in the data set. Hash function will return an integer from 0 to 19.

How long does it take to find a hash collision?

The birthday paradox specifically says that once you've seen roughly √(2k) items, there's a 50% chance of a collision, where k is the number of distinct possible outputs. In the case where the hash function hashes to an n-bit output, this means that you'll need roughly 2n/2 hashes before you get a collision.


Video Answer


2 Answers

If you assume that SHA-1 does a good job, you can conclude that there's a 1 in 2^160 chance that two given messages have the same hash (since SHA-1 produces a 160-bit hash).

2^160 is a ridiculously large number. It's roughly 10^48. Even if you have a million entries in your database, that's still a 1 in 10^42 chance that a new entry will share the same hash.

SHA-1 has proved to be fairly good, so I don't think you need to worry about collisions at all.

As a side note, use PHP's raw_output feature when you use SHA-1 as this will lead to a shorter string and hence will make your database operations a bit faster.

EDIT: To address the birthday paradox, a database with 10^18 (a million million million) entries has a chance of about 1 in 0.0000000000003 of a collision. Really not worth worrying about.

like image 137
Artelius Avatar answered Sep 29 '22 12:09

Artelius


Use a symmetric encryption scheme and a private server key to encrypt the ID (and other values) when you send them to the client and decrypt again on reception. Take care that your cryptographic function provides both confidentiality and integrity checks.

This allows you to use sensible values when talking to the DB without any collision, great security when talking to the client and reduces your probability to land on thedailyWTF by approximatly 2^160.

See also Pounding A Nail: Old Shoe or Glass Bottle?!

like image 40
David Schmitt Avatar answered Sep 29 '22 12:09

David Schmitt