My goal is to generate a short Hash string of 6 characters (possibly containing characters [A-Z][a-z][0-9]) for a string which is 42 case-insensitive alphanumeric characters in length. Uniqueness is the key requirement. Security or performance is not so important.
Is there a specific algorithm which will give this result or should I stick to truncating a MD5 Hash or a SHA-1 Hash (Like in this question)? If so, what is the probability of a collision?
#3 If you have an 8 bit hash output, how many possible hashes are there? There are 28 possibles hashes.
SHA-1 is a hash function designed by the N.S.A. ("SHA" = "Secure Hash Algorithm"). It returns a 160-bit hexadecimal string which is 40 characters long. It was used for secure encryption from 1996 to 2010, largely as a replacement for MD5, but now it is used mostly for data integrity.
In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits.
The hashing functions return a 128-bit, 160-bit, 256-bit, or 512-bit hash of the input data, depending on the algorithm selected.
Your best bet would be the truncating well-known hash function (MD5 or SHA-family) because these algorithms have statistically good uniform distributions of the hash values (and also using full hash and not just 6 chars there).
Now some calculations for probability of collision
- Number of letters in English alphabet: 26 - Add capitals: 26 - Add numerics: 10 -------------- In total you get 26 + 26 + 10 = 62 characters. Now you have 6 places, which gives you 62^6 possible combinations. That is 56.800.235.584 ~ 57 billion combinations. This is a space of possible hash values - N. -------------- To compute collisions let's use the formula Pcollision = K^2 / 2N Which is a very rough approximation of collision probability
Now let's see the result table for a number of items in a table - K
# items | Probability of collision --------------------------------------- 10 | 1.7 * 10^-9 100 | 1.7 * 10^-7 1K | 1.7 * 10^-5 10K | 1.7 * 10^-3 100K | 0.17
This formula can only be used for small K, but it shows that given 100K entries in the hash table you would roughly have 17% chance of collision.
Collision probability
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With