Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

substr md5 collision

I need a 4-character hash. At the moment I am taking the first 4 characters of a md5() hash. I am hashing a string which is 80 characters long or less. Will this lead to collision? or, what is the chance of collision, assuming I'll hash less than 65,536 (164) different elements?

like image 937
Neel Basu Avatar asked Jan 13 '11 15:01

Neel Basu


1 Answers

Well, each character of md5 is a hex bit. That means it can have one of 16 possible values. So if you're only using the first 4 "hex-bits", that means you can have 16 * 16 * 16 * 16 or 16^4 or 65536 or 2^16 possibilities.

So, that means that the total available "space" for results is only 16 bits wide. Now, according to the Birthday Attack/Problem, there are the following chances for collision:

  • 50% chance -> 300 entries
  • 1% chance -> 36 entries
  • 0.0000001% chance -> 2 entries.

So there is quite a high chance for collisions.

Now, you say you need a 4 character hash. Depending on the exact requirements, you can do:

  • 4 hex-bits for 16^4 (65,536) possible values
  • 4 alpha bits for 26^4 (456,976) possible values
  • 4 alpha numeric bits for 36^4 (1,679,616) possible values
  • 4 ascii printable bits for about 93^4 (74,805,201) possible values (assuming ASCII 33 -> 126)
  • 4 full bytes for 256^4 (4,294,967,296) possible values.

Now, which you choose will depend on the actual use case. Does the hash need to be transmitted to a browser? How are you storing it, etc.

I'll give an example of each (In PHP, but should be easy to translate / see what's going on):

4 Hex-Bits:

$hash = substr(md5($data), 0, 4);

4 Alpha bits:

$hash = substr(base_convert(md5($data), 16, 26)0, 4);
$hash = str_replace(range(0, 9), range('S', 'Z'), $hash);

4 Alpha Numeric bits:

$hash = substr(base_convert(md5($data), 16, 36), 0, 4);

4 Printable Assci Bits:

$hash = hash('md5', $data, true); // We want the raw bytes
$out = '';
for ($i = 0; $i < 4; $i++) {
    $out .= chr((ord($hash[$i]) % 93) + 33);
}

4 full bytes:

$hash = substr(hash('md5', $data, true), 0, 4); // We want the raw bytes
like image 72
ircmaxell Avatar answered Oct 02 '22 01:10

ircmaxell