C: Generating hash keys for large data sets?

Question

I am currently playing around with hashing and key generation trying to make my own hash key generator.

At the moment I have a list of ~90000 strings (each 1 word and a different word). I was wondering what the best way to generate keys (number keys not string keys) would be?

Currently depending on the words last ascii character I do a calculation based on the value of the letter.

The result is about 50% of the words generate a key that clashes with another.

I have used quadratic probing to then find space in the table for the rest of the words.

My question, as above, is what is generally the best sort of way to generate a key for 90000 different words? I know that the larger the data set, the more likely there will be clashes, but how would you suggest/or minimise the clashes?

Edit: Also - I don't care about cryptography, it just needs to be fast.

Thanks.

Sergey Kalinichenko · Accepted Answer

You can "borrow" Java's implementation of String's hashCode^*:

int hashCode(const char* s) {
    int h = 0;
    while (*s) {
        h = 31*h + (*s++);
    }
    return h;
}

This function achieves a reasonable separation, and is among the most widely used hash functions out there.

^* which, as it turns out, Java in turn "borrowed" from Kernighan & Ritchie's book on C programming.

C: Generating hash keys for large data sets?

Tags:

c

key

hash

user2013417

1 Answers

Sergey Kalinichenko

Recent Activity

Donate For Us

C: Generating hash keys for large data sets?

Tags:

c

key

hash

user2013417

1 Answers

Sergey Kalinichenko

Related questions

Recent Activity

Donate For Us