I need a 4-character hash. At the moment I am taking the first 4 characters of a md5()
hash. I am hashing a string which is 80 characters long or less. Will this lead to collision? or, what is the chance of collision, assuming I'll hash less than 65,536 (164) different elements?
Well, each character of md5
is a hex bit. That means it can have one of 16 possible values. So if you're only using the first 4 "hex-bits", that means you can have 16 * 16 * 16 * 16
or 16^4
or 65536 or 2^16
possibilities.
So, that means that the total available "space" for results is only 16 bits wide. Now, according to the Birthday Attack/Problem, there are the following chances for collision:
50%
chance -> 300
entries1%
chance -> 36
entries0.0000001%
chance -> 2
entries.So there is quite a high chance for collisions.
Now, you say you need a 4 character hash. Depending on the exact requirements, you can do:
16^4
(65,536) possible values 26^4
(456,976) possible values 36^4
(1,679,616) possible values 93^4
(74,805,201) possible values (assuming ASCII 33 -> 126)256^4
(4,294,967,296) possible values.Now, which you choose will depend on the actual use case. Does the hash need to be transmitted to a browser? How are you storing it, etc.
I'll give an example of each (In PHP, but should be easy to translate / see what's going on):
4 Hex-Bits:
$hash = substr(md5($data), 0, 4);
4 Alpha bits:
$hash = substr(base_convert(md5($data), 16, 26)0, 4);
$hash = str_replace(range(0, 9), range('S', 'Z'), $hash);
4 Alpha Numeric bits:
$hash = substr(base_convert(md5($data), 16, 36), 0, 4);
4 Printable Assci Bits:
$hash = hash('md5', $data, true); // We want the raw bytes
$out = '';
for ($i = 0; $i < 4; $i++) {
$out .= chr((ord($hash[$i]) % 93) + 33);
}
4 full bytes:
$hash = substr(hash('md5', $data, true), 0, 4); // We want the raw bytes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With