Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implement a hash table

Tags:

c

hashtable

I'm trying to create an efficient look-up table in C.

I have an integer as a key and a variable length char* as the value.

I've looked at uthash, but this requires a fixed length char* value. If I make this a big number, then I'm using too much memory.

struct my_struct {
    int key;
    char value[10];             
    UT_hash_handle hh;
};

Has anyone got any pointers? Any insight greatly appreciated.


Thanks everyone for the answers. I've gone with uthash and defined my own custom struct to accommodate my data.

like image 654
Eamorr Avatar asked Jul 27 '11 13:07

Eamorr


3 Answers

You first have to think of your collision strategy:

  1. Will you have multiple hash functions?
  2. Or will you have to use containers inside of the hashtable?

We'll pick 1.

Then you have to choose a nicely distributed hash function. For the example, we'll pick

int hash_fun(int key, int try, int max) {
    return (key + try) % max;
}

If you need something better, maybe have a look at the middle-squared method.

Then, you'll have to decide, what a hash table is.

struct hash_table {
    int max;
    int number_of_elements;
    struct my_struct **elements;
};

Then, we'll have to define how to insert and to retrieve.

int hash_insert(struct my_struct *data, struct hash_table *hash_table) {
    int try, hash;
    if(hash_table->number_of_elements >= hash_table->max) {
        return 0; // FULL
    }
    for(try = 0; true; try++) {
        hash = hash_fun(data->key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) { // empty cell
            hash_table->elements[hash] = data;
            hash_table->number_of_elements++;
            return 1;
        }
    }
    return 0;
}

struct my_struct *hash_retrieve(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            return hash_table->elements[hash];
        }
    }
    return 0;
}

And least a method to remove:

int hash_delete(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            hash_table->number_of_elements--;
            hash_table->elements[hash] = 0;
            return 1; // Success
        }
    }
    return 0;
}
like image 66
marc Avatar answered Oct 22 '22 15:10

marc


Declare the value field as void *value.

This way you can have any type of data as the value, but the responsibility for allocating and freeing it will be delegated to the client code.

like image 37
Blagovest Buyukliev Avatar answered Oct 22 '22 16:10

Blagovest Buyukliev


It really depends on the distribution of your key field. For example, if it's a unique value always between 0 and 255 inclusive, just use key % 256 to select the bucket and you have a perfect hash.

If it's equally distributed across all possible int values, any function which gives you an equally distributed hash value will do (such as the afore-mentioned key % 256) albeit with multiple values in each bucket.

Without knowing the distribution, it's a little hard to talk about efficient hashes.

like image 5
paxdiablo Avatar answered Oct 22 '22 14:10

paxdiablo