Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimal perfect hash function

Tags:

perfect-hash

I have many integers in range [0; 2^63-1]. There is only 10^8 integers, however. There is no duplicates. Full list is known at compile-time but it is just unique random numbers. These numbers never changes.
To store one integer explicitly, 8 bytes required, and there is associated 1-byte values, so explicit storing requires about 860 MB.
So I want to find minimal perfect hash function to map each of 10^8 integers from [0;2^63-1] to [0;10^8-1]. I should find this function only once, data never changes, and function can be complicated. But it should be minimal, perfect, and calculating should be fast. How I can do this better? Maybe it is possible to find and use some subsequences if they happens?
Thanks.

like image 489
tin_coder Avatar asked Jul 19 '11 06:07

tin_coder


People also ask

Are perfect hash functions possible?

A perfect hash function can be constructed that maps each of the keys to a distinct integer, with no collisions. These functions only work with the specific set of keys for which they were constructed. Passing an unknown key will result a false match or even crash. A minimal perfect hash function goes one step further.

What is a perfect hash table?

Perfect hashing is defined as a model of hashing in which any set of n elements can be stored in a hash table of equal size and can have lookups performed in constant time. It was specifically invented and discussed by Fredman, Komlos and Szemeredi (1984) and has therefore been nicknamed as "FKS Hashing".

What is perfect hash function Java?

A perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions. In mathematical terms, it is an injective function. Perfect hash functions may be used to implement a lookup table with constant worst-case access time.

Which is the best hash function?

Probably the one most commonly used is SHA-256, which the National Institute of Standards and Technology (NIST) recommends using instead of MD5 or SHA-1. The SHA-256 algorithm returns hash value of 256-bits, or 64 hexadecimal digits.


2 Answers

Let your computer do the work for you:

http://www.gnu.org/software/gperf/

Quote: "GNU gperf is a perfect hash function generator. For a given list of strings, it produces a hash function and hash table, in form of C or C++ code, for looking up a value depending on the input string. The hash function is perfect, which means that the hash table has no collisions, and the hash table lookup needs a single string comparison only. "

like image 86
Mario Avatar answered Sep 20 '22 20:09

Mario


I'm working on an algorithm and Java implementation that needs less than 1.6 bits per key.

Previously, I have implemented a minimal perfect hash function tool in Java that needs less than 2.0 bits per key.

Other algorithms are implemented in CMPH. For example CHD needs about 2.06 bits per key by default. It can be configured to use less space, but generation is then slower.

like image 24
Thomas Mueller Avatar answered Sep 20 '22 20:09

Thomas Mueller