Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it wrong to use a hash for a unique ID?

Tags:

php

hash

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:

sha1(uniqid(mt_rand(), true))

Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

like image 441
texelate Avatar asked May 03 '13 05:05

texelate


4 Answers

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.

To answer your final question:

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.

As @wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

like image 88
Morten Jensen Avatar answered Oct 19 '22 19:10

Morten Jensen


How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.

Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

like image 33
BrianH Avatar answered Oct 19 '22 18:10

BrianH


Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

like image 41
Sven Tore Avatar answered Oct 19 '22 19:10

Sven Tore


Computer random isn't actually random, you know? The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

like image 33
Henrique Barcelos Avatar answered Oct 19 '22 18:10

Henrique Barcelos