Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A one-way hash (no collisions) taking 32-bit integers to 36-bit integers?

I want to use a one-way hashing algorithm (no collisions) to convert 32-bit integers to 36-bit integers.

Can anyone explain how to do that?

like image 328
user1548832 Avatar asked Dec 20 '22 18:12

user1548832


1 Answers

"One-way" means, that it is "hard" to figure out what x did give the hash-result h(x). Since the term "hard" is not sharply defined, there is also no sharp definition of what is a one-way function.

"No collision" means, that h(x) is different from h(y) for every pair of x and y where x is different from y. This is a sharp definition, but it is hard to prove if h(x) realy is a one-way-funcion. You must compare the hash-results of every pair of two 32-bit-numbers and test if they are different.

The best way to do this is to calculate all posible h(x) and store them, together with their x, in an array. Then sort the array by h(x), then walk through this list and test if two neighbours have the same h(x). If you don't find identic neighbours, your hash-function is free of collisions.

BUT: If you really can do this, your function can't really be a one-way-function, because the list you just generated to prove collision-freenes is a very fast lookup-table that lets you find the x for each h(x) in a searchtime of log(n). This might even be faster than calculation h(x) from x.

So lets figure out, how long this really takes

A 32-bit-integer is a number between 0 and 4294967295. Lets say it takes 0.1 ms to calculate h(x) from x. Depending on the hash-algorithm this is realistic even on a cheap notebook. So in 1 second you get 10,000 hash-numbers and in one day 864,000,000 numbers. It takes you just 5 days to calculate all posible numbers and to store them on disc.

Each entry has 4 byte for the 32-bit-number and 5 byte for the 36-bit-hash. Makes 9 byte. So the complete table has 38,654,705,664 byte. This is 38 GB. You can store this on every low-budged notebook. Sorting of this table needs some minutes, that doesn't count against the 5 days we needed for calculation.

So building this table on a second hand 200$-notebook is absolutely no problem. Once you have it, its very easy to prove if its really collision-free, but by building this table you did also prove that it is NOT a one-way function!

So what is the best solution?

  1. Generate a list of 4,294,967,296 random 36-bit numbers, add to every entry a 32-bit-number that is the entries line-number (starting with 0).
  2. Sort the list.
  3. reset the did-change-a-number-flag
  4. walk through the list. Compare the actual entry with the previous entry. If they are different, go to step 7.
  5. replace the 36-bit-number by a new random 36-bit-number
  6. set the did-change-a-number-flag
  7. If you reached the end of the list: is the flag set? If so, go to step 2.
  8. sort the list by the 32-bit-number (the previous line number)

After step 1 the list will contain 6,25% of collisions (about 268.4 million collisions). At each iteration you reduce the number of collisions to its 16th part. It will take aproximately 8 iterations to eliminate all collisons.

This 38 GB-Table is now you super-fast absolutely collision-free hash-function. And it is as one-way as any 32-to-36-bit-hashfunction can be. Meaning: There can be no other collision-free hash-function where it is harder to find x for a given h(x).

like image 155
Hubert Schölnast Avatar answered May 11 '23 01:05

Hubert Schölnast