Is it possible to create a Minimal Perfect Hash function without a separate lookup table for a small (<64) set of keys?

Q: Which function can be used with minimal perfect hashing?

A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers – usually the numbers from 0 to n − 1 or from 1 to n. A more formal way of expressing this is: Let j and k be elements of some finite set S.

Q: Are perfect hash functions possible?

A perfect hash function can be constructed that maps each of the keys to a distinct integer, with no collisions. These functions only work with the specific set of keys for which they were constructed. Passing an unknown key will result a false match or even crash. A minimal perfect hash function goes one step further.

Q: What is a perfect hash in a hash table?

Definition: A hash function that maps each different key to a distinct integer. Usually all possible keys must be known beforehand. A hash table that uses a perfect hash has no collisions. Formal Definition: A function f is perfect for a set of keys K iff ∀ j, k ∈ K f(j) = f(k) → j = k.

Q: What are the requirements of a good hash function?

There are four main characteristics of a good hash function: 1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of possible hash values.

Tags:

c

algorithm

hash

perfect-hash

I recently read this article Throw away the keys: Easy, Minimal Perfect Hashing about generating a minimal perfect hash table for a known set of keys.

The article seems to assume that you need an intermediate table. Is there any other, simpler way to generate such a function if we assume that the set of keys is small (i.e. < 64).

In my case, I want to map a set of thread ID:s to a unique block of data within an array. The threads are started before the hash function is generated and stay constant during the running time of the program. The exact number of threads vary but stays fixed during the runtime of the program:

unsigned int thread_ids*;
unsigned int thread_count;
struct {
    /* Some thread specific data */
}* ThreadData;

int start_threads () {
    /* Code which starts the threads and allocates the threaddata. */
}

int f(thread_id) {
    /* return unique index into threadData */
}

int main() {
    thread_count = 64; /* This number will be small, e.g. < 64 */
    start_threads();
    ThreadData[f(thread_ids[0])]
}

760

asked Apr 24 '19 07:04

Anton Lahti

1 Answers

You could build a perfect hash as follows, using a brute-force search. For 64 entries, the size of the target array needs to be at least 512 entries, otherwise search won't find an index within reasonable time.

The perfect hash function is then murmur(x + perfectHashIndex) & (TARGET_SIZE - 1)

#include <stdio.h>
#include <stdint.h>
#include <string.h>

static uint64_t murmur64(uint64_t h) {
    h ^= h >> 33;
    h *= UINT64_C(0xff51afd7ed558ccd);
    h ^= h >> 33;
    h *= UINT64_C(0xc4ceb9fe1a85ec53);
    h ^= h >> 33;
    return h;
}

// must be a power of 2
#define TARGET_SIZE 512

static uint64_t findPerfectHashIndex(uint64_t *array, int size) {
    uint64_t used[TARGET_SIZE / 64];
    for (uint64_t index = 0; index < 1000;) {
        memset(used, 0, TARGET_SIZE / 64 * sizeof(uint64_t));
        for (size_t i = 0; i < size; i++) {
            uint64_t x = murmur64(array[i] + index) & (TARGET_SIZE - 1);
            if (((used[x >> 6] >> (x & 63)) & 1) != 0) {
                goto outer;
            }
            used[x >> 6] |= 1UL << (x & 63);
        }
        return index;
        outer:
        index++;
    }
    // not found
    return -1;
}

int main() {
    int size = 64;
    uint64_t ids[size];
    for(int i=0; i<size; i++) ids[i] = 10 * i;
    uint64_t perfectHashIndex = findPerfectHashIndex(ids, size);
    if (perfectHashIndex == -1) {
        printf("perfectHashIndex not found\n");
    } else {
        printf("perfectHashIndex = %lld\n", perfectHashIndex);
        for(int i=0; i<size; i++) {
            printf("  x[%d] = %lld, murmur(x + perfectHashIndex) & (TARGET_SIZE - 1) = %d\n", 
                i, ids[i], murmur64(ids[i] + perfectHashIndex) & (TARGET_SIZE - 1));
        }
    }
}

194

answered Sep 28 '22 12:09

Thomas Mueller

Related questions
                            
                                Where to add a CFLAG, such as -std=gnu99, into an (Eclipse CDT) autotools project
                            
                                Calling Python code from a C thread
                            
                                How to copy a file in C/C++ with libssh and SFTP
                            
                                Are there Clojure-like STM libraries for C
                            
                                How can I pass an array as parameters to a vararg function?
                            
                                ARM-Kernel Testing Module
                            
                                color object tracking in openCV keeps detecting the skin
                            
                                What is different functions: `malloc()` and `kmalloc()`?
                            
                                What is the complexity of this sum algorithm?
                            
                                static library implementation vs including source code implementation
                            
                                Should I free strdup pointer after basename/dirname in C?
                            
                                Is there a way of passing macro names as arguments to nested macros without them being expanded when the outermost macro is expanded?
                            
                                Pointer arithmetic around cast
                            
                                How %a conversion work in printf statement?
                            
                                SAT solving with more than 2^32 clauses
                            
                                Need IPv6 Multicast C code that works on iOS 9
                            
                                Is zeroing out the "sockaddr_in" structure necessary?
                            
                                What's the difference between puts and printf in C compiled into Assembly language
                            
                                C fixed size array treated as variable size
                            
                                Log whether a global variable has been read or written

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With