Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash table optimized for full iteration + key replacement

I have a hash table where the vast majority of accesses at run-time follow one of the following patterns:

  • Iterate through all key/value pairs. (The speed of this operation is critical.)
  • Modify keys (i.e. remove a key/value pair & add another with the same value but a different key. Detect duplicate keys & combine values if necessary.) This is done in a loop, affecting many thousands of keys, but with no other operations intervening.

I would also like it to consume as little memory as possible.

Other standard operations must be available, though they are used less frequently, e.g.

  • Insert a new key/value pair
  • Given a key, look up the corresponding value
  • Change the value associated with an existing key

Of course all "standard" hash table implementations, including standard libraries of most high-level-languages, have all of these capabilities. What I am looking for is an implementation that is optimized for the operations in the first list.

Issues with common implementations:

  • Most hash table implementations use separate chaining (i.e. a linked list for each bucket.) This works but I am hoping for something that occupies less memory with better locality of reference. Note: my keys are small (13 bytes each, padded to 16 bytes.)
  • Most open addressing schemes have a major disadvantage for my application: Keys are removed and replaced in large groups. That leaves deletion markers that increase the load factor, requiring the table to be re-built frequently.

Schemes that work, but are less than ideal:

  • Separate chaining with an array (instead of a linked list) per bucket:
    Poor locality of reference, resulting from memory fragmentation as small arrays are reallocated many times
  • Linear probing/quadratic hashing/double hashing (with or without Brent's Variation):
    Table quickly fills up with deletion markers
  • Cuckoo hashing
    Only works for <50% load factor, and I want a high LF to save memory and speed up iteration.

Is there a specialized hashing scheme that would work well for this case?


Note: I have a good hash function that works well with both power-of-2 and prime table sizes, and can be used for double hashing, so this shouldn't be an issue.

like image 366
finnw Avatar asked May 12 '11 17:05

finnw


People also ask

Why are hash tables efficient?

Hash tables are useful because they are fast. The theoretical average running time for find , insert , and erase is the optimal O(1) — meaning no matter how big the hash table gets, the average number of steps needed to perform those operations on any hypothetical computer has a fixed limit.

What is an optimal hash table size?

But a good general “rule of thumb” is: The hash table should be an array with length about 1.3 times the maximum number of keys that will actually be in the table, and. Size of hash table array should be a prime number.

How can I make a hash table faster?

The trick is to use Robin Hood hashing with an upper limit on the number of probes. If an element has to be more than X positions away from its ideal position, you grow the table and hope that with a bigger table every element can be close to where it wants to be.

How efficient is a hash table?

The most memory efficient datastructure for associations The hash table with the best memory efficiency is simply the one with the highest load factor, (it can even exceed 100% memory efficiency by using key compression with compact hashing ). A hash table like that does still provide O(1) lookups, just very slow.


1 Answers

Would Extendable Hashing help? Iterating though the keys by walking the 'directory' should be fast. Not sure if the "modify key for value" operation is any better with this scheme or not.

like image 126
AShelly Avatar answered Oct 23 '22 12:10

AShelly