Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unique int to int hash

Tags:

hash

I'm curious as to whether or not there's some simple and/or well known hash method with the following properties:

  1. It transforms a 32-bit int into another 32-bit int
  2. No two non-equal inputs produce the same output
  3. It shouldn't be immediately obvious from looking at the output that two inputs were similar (in terms of difference and bitmask), meaning that hash(a) and hash(a+1) should have vastly different outputs, as should hash(a) and hash(a & 0x100000). (This rules out simply XORing with a random value.)

While such systems must obviously exist in theory, are there any in practice?

like image 472
BambooleanLogic Avatar asked Dec 27 '22 04:12

BambooleanLogic


1 Answers

There are many in practice!

A trivial solution would be to multiply the input by any odd number, and taking the bottom 32 bits of the result. That is:

y = (x * YOUR_ODD_NUMBER) & 0xffffffff;

This does have some weaknesses, though. It always maps zero to zero, if you choose a small number like 3 then the mapping will be fairly obvious (similarly, if you choose a large number like 0xffffffff you'll get another obvious mapping), and the least significant bit is not changed. Generally the low bits can affect higher bits, but the high bits cannot affect lower bits.

Another approach is to exclusive-or the number with a shifted version of itself several times:

x ^= x >> YOUR_FIRST_SHIFT;
x ^= x << YOUR_SECOND_SHIFT;
y = x ^ (x >> YOUR_THIRD_SHIFT);

You can stack up as many of these trivial operations as you like to try to hide the weaknesses of individual stages. Even if one operation on its own isn't very good, it can contribute usefully in a more complex chain of operations. For example, exclusive-or with some constant would avoid the problem of mapping zero to zero by multiplication alone, and the shift and exclusive-or technique allows the low bits to be affected by the high bits.

If you look at PRNGs, you'll find a lot of them have a period nearly the same size as their state. They achieve this by permuting their state in the way you have specified -- through a 1:1 mapping where the relationship between one state and the next isn't too obvious -- then they present some of that state (or all of it) as a pseudo-random number. Some PRNGs and hashes also finish with a tempering stage, where they perform another one of these mappings to hide some of their own weaknesses.

If you run your hash function in a loop, feeding y back into x on each iteration, then you have a PRNG, and you can test the randomness of it with tools like dieharder.

Not all PRNGs have the ideally-long-period property, and that property is not necessary for a good hash function, but some PRNG algorithms can be a useful source of ideas for operations to perform and they come with comprehensive analysis.

like image 146
sh1 Avatar answered Feb 26 '23 06:02

sh1