I have an efficiency critical application, where I need such an array-type data structure A
. Its keys are 0, 1, 2,...
, and its values are uint64_t
distinct values. I need two constant operations:
1. Given i, return A[i];
2. Given val, return i such that A[i] == val
I prefer not to use hash table. Because I tried GLib GHashTable, it took around 20 mins to load 60 million values into the hash table (If I remove the insertion statement, it took only around 6 seconds). The time is not acceptable for my application. Or maybe somebody recommend other hash table libraries? I tried uthash.c, it crashed immediately.
I also tried SDArray, but it seems not the right one.
Does anybody know any data structure that would fulfill my requirements? Or any efficient hash table implementations? I prefer using C/C++.
Thanks.
In general, you need two hash tables for this task. As you know, hash tables give you a key look-up in expected constant time. Searching for a value requires iterating through the whole data structure, since information about the values isn't encoded in the hash look-up table.
Use two hash tables: One for key-value and one (reversed) for value-key look-up. In your particular case, the forward search can be done using a vector as long as your keys are "sequential". But this doesn't change the requirement for a data structure enabling fast reverse look-up.
Regarding the hash table implementation: In C++11, you have the new standard container std::unordererd_map
available.
An implementation might look like this (of course this is tweakable, like introducing const-correctness, calling by reference etc.):
std::unordered_map<K,T> kvMap; // hash table for forward search
std::unordered_map<T,K> vkMap; // hash table for backward search
void insert(std::pair<K,T> item) {
kvMap.insert(item);
vkMap.insert(std::make_pair(item.second, item.first));
}
// expected O(1)
T valueForKey(K key) {
return kvMap[key];
}
// expected O(1)
K keyForValue(T value) {
return vkMap[value];
}
A clean C++11 implementation should "wrap" around the key-value hash map, so you have the "standard" interface in your wrapper class. Always keep the reverse map in sync with your forward map.
Regarding the creation performance: In most implementations, there is a way to tell the data structure how much elements are going to be inserted, called "reserve". For hash tables, this is a huge performance benefit, as dynamically resizing the data structure (which happens during insertions every now and then) completely re-structures the whole hash table, as it changes the hash function itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With