Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

64bit array operation by C/C++

I have an efficiency critical application, where I need such an array-type data structure A. Its keys are 0, 1, 2,..., and its values are uint64_t distinct values. I need two constant operations:

1. Given i, return A[i];
2. Given val, return i such that A[i] == val

I prefer not to use hash table. Because I tried GLib GHashTable, it took around 20 mins to load 60 million values into the hash table (If I remove the insertion statement, it took only around 6 seconds). The time is not acceptable for my application. Or maybe somebody recommend other hash table libraries? I tried uthash.c, it crashed immediately.

I also tried SDArray, but it seems not the right one.

Does anybody know any data structure that would fulfill my requirements? Or any efficient hash table implementations? I prefer using C/C++.

Thanks.

like image 285
Joy Avatar asked Apr 03 '13 09:04

Joy


1 Answers

In general, you need two hash tables for this task. As you know, hash tables give you a key look-up in expected constant time. Searching for a value requires iterating through the whole data structure, since information about the values isn't encoded in the hash look-up table.

Use two hash tables: One for key-value and one (reversed) for value-key look-up. In your particular case, the forward search can be done using a vector as long as your keys are "sequential". But this doesn't change the requirement for a data structure enabling fast reverse look-up.

Regarding the hash table implementation: In C++11, you have the new standard container std::unordererd_map available.

An implementation might look like this (of course this is tweakable, like introducing const-correctness, calling by reference etc.):

std::unordered_map<K,T> kvMap; // hash table for forward search
std::unordered_map<T,K> vkMap; // hash table for backward search

void insert(std::pair<K,T> item) {
    kvMap.insert(item);
    vkMap.insert(std::make_pair(item.second, item.first));
}

// expected O(1)
T valueForKey(K key) {
    return kvMap[key];
}

// expected O(1)
K keyForValue(T value) {
    return vkMap[value];
}

A clean C++11 implementation should "wrap" around the key-value hash map, so you have the "standard" interface in your wrapper class. Always keep the reverse map in sync with your forward map.

Regarding the creation performance: In most implementations, there is a way to tell the data structure how much elements are going to be inserted, called "reserve". For hash tables, this is a huge performance benefit, as dynamically resizing the data structure (which happens during insertions every now and then) completely re-structures the whole hash table, as it changes the hash function itself.

like image 97
leemes Avatar answered Sep 26 '22 00:09

leemes