I started reading about implementing various data structures a few days back, and got around to hash tables and got stuck on a specific point.
My understanding of how a hash table is implemented: A key K is passed to a hash function H which returns a hashed version of K, HK. HK should probably be at least a uint32_t to account for collisions, we have an array of size X, the item is stored at the index HK of this array.. but, wouldn't this require a pre-allocated array of length uint32_t atleast (or whatever the return value of H is)? assuming that we don't store the data itself within that array, and instead store a ptr to the data, then we'd need an array of ptr_t of length uint32_t.. this seems quite wasteful, on 64bit that would mean memory usage of: 2^32 * 8 = 34359738368 bytes or ~32GB just for the array of ptrs to the data, which is obviously not how its actually implemented in real life..
So what am I missing?
It depends upon the implementation. There are three basic ways this is done:
1) Small hashes are used. So instead of using a 32-bit hash, say, an 8-bit hash is used.
2) Multiple levels of hashing are used. So, for example, a 12-bit hash may determine which "bucket" an entry goes in, but a collision only occurs if the full 32-bit hash matches. Each bucket is stored in a linked list or similar structure. (Perhaps one optimized for searching for the full 32-bit hash within it.)
3) Sparse arrays are used. These are data structures that don't need to actually store blank spaces for unfilled slots. (In practice, it could be something entirely different such as a tree, but it acts like a sparse array with efficient searching.)
You should construct your hash table so it can be extended. There are some methods to do that. Read this it will be helpful. In this example a linked list is used. And you also need to extend your table if there are no empty values anymore. You'll get following problem: if you extend your map, your H function can return new HK values for old K keys. So you must think how to solve this issue. One of the methods is to reload all values when table was extended. It's normal if you extend it not often.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With