Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C: Generating hash keys for large data sets?

Tags:

c

key

hash

I am currently playing around with hashing and key generation trying to make my own hash key generator.

At the moment I have a list of ~90000 strings (each 1 word and a different word). I was wondering what the best way to generate keys (number keys not string keys) would be?

Currently depending on the words last ascii character I do a calculation based on the value of the letter.

The result is about 50% of the words generate a key that clashes with another.

I have used quadratic probing to then find space in the table for the rest of the words.

My question, as above, is what is generally the best sort of way to generate a key for 90000 different words? I know that the larger the data set, the more likely there will be clashes, but how would you suggest/or minimise the clashes?

Edit: Also - I don't care about cryptography, it just needs to be fast.

Thanks.

like image 437
user2013417 Avatar asked Feb 17 '23 12:02

user2013417


1 Answers

You can "borrow" Java's implementation of String's hashCode*:

int hashCode(const char* s) {
    int h = 0;
    while (*s) {
        h = 31*h + (*s++);
    }
    return h;
}

This function achieves a reasonable separation, and is among the most widely used hash functions out there.

* which, as it turns out, Java in turn "borrowed" from Kernighan & Ritchie's book on C programming.

like image 52
Sergey Kalinichenko Avatar answered Mar 01 '23 17:03

Sergey Kalinichenko