Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hashing a small number to a random looking 64 bit integer

I am looking for a hash-function which operates on a small integer (say in the range 0...1000) and outputs a 64 bit int.

The result-set should look like a random distribution of 64 bit ints: a uniform distribution with no linear correlation between the results.

I was hoping for a function that only takes a few CPU-cycles to execute. (the code will be in C++).

I considered multiplying the input by a big prime number and taking the modulo 2**64 (something like a linear congruent generator), but there are obvious dependencies between the outputs (in the lower bits).

Googling did not show up anything, but I am probably using wrong search terms.

Does such a function exist?


Some Background-info:

I want to avoid using a big persistent table with pseudo random numbers in an algorithm, and calculate random-looking numbers on the fly.

Security is not an issue.

like image 222
mirk Avatar asked Dec 14 '11 17:12

mirk


People also ask

How do you hash an integer?

The most commonly used method for hashing integers is called modular hashing: we choose the array size M to be prime, and, for any positive integer key k, compute the remainder when dividing k by M. This function is very easy to compute (k % M, in Java), and is effective in dispersing the keys evenly between 0 and M-1.

What is random hashing?

A technique for randomizing the input to a cryptographic hash function. Source(s): NIST SP 800-106. A process by which the input to a hash function is randomized before being processed by the hash function. Source(s):

What is 32bit hash?

The 32-bit long hash value is a hexadecimal number of 8 characters. MD4 is a Message Digest Algorithm developed by Ronald L. Rivest from RSA Data Security, Inc. Currently it's considered insecure, but it's very fast on 32-bit mashines and it's used for calculating EDonkey 2000 hashes in the EDonkey p2p network.

Can a random number generator be used as a hash function?

If you use a random function instead of the hash function, you'll always end with all bits set to 1 (like in the case L≫1), and you cannot use it to approximate the number of distinct elements.


2 Answers

I tested the 64-bit finalizer of MurmurHash3 (suggested by @aix and this SO post). This gives zero if the input is zero, so I increased the input parameter by 1 first:

typedef unsigned long long uint64;

inline uint64 fasthash(uint64 i)
{
  i += 1ULL;
  i ^= i >> 33ULL;
  i *= 0xff51afd7ed558ccdULL;
  i ^= i >> 33ULL;
  i *= 0xc4ceb9fe1a85ec53ULL;
  i ^= i >> 33ULL;
  return i;
}

Here the input argument i is a small integer, for example an element of {0, 1, ..., 1000}. The output looks random:

i       fasthash(i) decimal:    fasthash(i) hex:
0       12994781566227106604    0xB456BCFC34C2CB2C
1       4233148493373801447     0x3ABF2A20650683E7
2       815575690806614222      0x0B5181C509F8D8CE
3       5156626420896634997     0x47900468A8F01875
...     ...                     ...

There is no linear correlation between subsequent elements of the series:

fasthash autocorrelation

The range of both axes is 0..2^64-1

like image 61
kol Avatar answered Sep 29 '22 08:09

kol


Why not use an existing hash function, such as MurmurHash3 with a 64-bit finalizer? According to the author, the function takes tens of CPU cycles per key on current Intel hardware.

like image 37
NPE Avatar answered Sep 29 '22 07:09

NPE