Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Evenly distributed hash function

I need a hash function that takes a few (eg. 2 or 3) unsigned integers as input, and returns a floating point value between -1 and +1.

The collection of these returned values must be evenly distributed. A sequence of outputs from the function must appear to be a random sequence, even if the input numbers are sequential. Also the faster the better, i'm calling it a LOT of times.

I hope this isn't too much to ask :S...

like image 621
Hannesh Avatar asked Feb 26 '23 03:02

Hannesh


2 Answers

Murmurhash is a very good (strong) and fast hash function which has had some serious testing done on it.

http://sites.google.com/site/murmurhash/

While it is not dedicated to integers per se, it can be quickly adjusted to do so. I have such an alternate formulation which might be more convenient for you if your words are not sequently laid out in memory:

#define MURMURHASH2A_R 24
#define MURMURHASH2A_MULTIPLIER 0x5bd1e995
#define MURMURHASH2A_SEED 2166136261U  // No seed suggested, so using FNV32_OFFSET_BASIS
#define murmurhash2a_init(h) do { h = MURMURHASH2A_SEED; } while (0)
#define murmurhash2a_update(h,word)                     \
do {                                                    \
  u_int mmh2ak = (word) * MURMURHASH2A_MULTIPLIER;      \
  mmh2ak ^= mmh2ak >> MURMURHASH2A_R;                   \
  mmh2ak *= MURMURHASH2A_MULTIPLIER;                    \
  h *= MURMURHASH2A_MULTIPLIER;                         \
  h ^= mmh2ak;                                          \
 } while (0)
#define murmurhash2a_final(h)                   \
do {                                            \
  h ^= h >> 13;                                 \
  h *= MURMURHASH2A_MULTIPLIER;                 \
  h ^= h >> 15;                                 \
 } while (0)

u_int hash;
murmurhash2a_init(hash);
murmurhash2a_update(hash,firstint);
murmurhash2a_update(hash,secondint);
[...]
murmurhash2a_final(hash);

Obviously this is returning 0-2^32-1. There is a 64 bit version on the murmurhash site. Conversion of integer to a float over a range is left as an excercise (in division) for the reader.

like image 134
Seth Robertson Avatar answered Mar 03 '23 11:03

Seth Robertson


You can employ standard scheme for such tasks: (a0 + Q*a1 + Q^2*a2 + Q^3*a3 + ...) % M where M is a very large prime number and Q is coefficient of your choice.
Once you have random enough hash in range [0, M), converting it to floating point number [-1, 1] is trivial.

Or you can remove % M and allow integer overflow to happen, although I'm not sure how secure it is (from 'evenly distributed' perspective).

A sequence of outputs from the function must appear to be a random sequence, even if the input numbers are sequential.
For this you can instead of ai use ai*ai in the expression. Anyway, here's the simple implementation in Java.

double hash(int... a) {
    int Q = 433494437;
    int result = 0;
    for (int n : a) {
        result = result * Q + n * n;
    }
    result *= Q;
    return (double) result / Integer.MIN_VALUE;
}

Output does look random even for consecutive numbers. You can also use 64-bit integer for more precision.

like image 24
Nikita Rybak Avatar answered Mar 03 '23 11:03

Nikita Rybak