I want to use a one-way hashing algorithm (no collisions) to convert 32-bit integers to 36-bit integers.
Can anyone explain how to do that?
"One-way" means, that it is "hard" to figure out what x did give the hash-result h(x). Since the term "hard" is not sharply defined, there is also no sharp definition of what is a one-way function.
"No collision" means, that h(x) is different from h(y) for every pair of x and y where x is different from y. This is a sharp definition, but it is hard to prove if h(x) realy is a one-way-funcion. You must compare the hash-results of every pair of two 32-bit-numbers and test if they are different.
The best way to do this is to calculate all posible h(x) and store them, together with their x, in an array. Then sort the array by h(x), then walk through this list and test if two neighbours have the same h(x). If you don't find identic neighbours, your hash-function is free of collisions.
BUT: If you really can do this, your function can't really be a one-way-function, because the list you just generated to prove collision-freenes is a very fast lookup-table that lets you find the x for each h(x) in a searchtime of log(n). This might even be faster than calculation h(x) from x.
So lets figure out, how long this really takes
A 32-bit-integer is a number between 0 and 4294967295. Lets say it takes 0.1 ms to calculate h(x) from x. Depending on the hash-algorithm this is realistic even on a cheap notebook. So in 1 second you get 10,000 hash-numbers and in one day 864,000,000 numbers. It takes you just 5 days to calculate all posible numbers and to store them on disc.
Each entry has 4 byte for the 32-bit-number and 5 byte for the 36-bit-hash. Makes 9 byte. So the complete table has 38,654,705,664 byte. This is 38 GB. You can store this on every low-budged notebook. Sorting of this table needs some minutes, that doesn't count against the 5 days we needed for calculation.
So building this table on a second hand 200$-notebook is absolutely no problem. Once you have it, its very easy to prove if its really collision-free, but by building this table you did also prove that it is NOT a one-way function!
So what is the best solution?
After step 1 the list will contain 6,25% of collisions (about 268.4 million collisions). At each iteration you reduce the number of collisions to its 16th part. It will take aproximately 8 iterations to eliminate all collisons.
This 38 GB-Table is now you super-fast absolutely collision-free hash-function. And it is as one-way as any 32-to-36-bit-hashfunction can be. Meaning: There can be no other collision-free hash-function where it is harder to find x for a given h(x).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With