What is the more efficient approach for using hashmaps?
A) Use multiple smaller hashmaps, or
B) store all objects in one giant hashmap?
(Assume that the hashing algorithm for the keys is fairly efficient, resulting in few collisions)
CLARIFICATION: Option B implies segregation by primary key -- i.e. no additional lookup is necessary to determine which actual hashmap to use. (For example, if the lookup keys are alphanumeric, Hashmap 1 stores the A's, Hashmap 2 stores B's, and so on.)
Definitely B. The advantage of hash tables is that the average number of comparisons per lookup is independent of the size.
If you split your map into N smaller hashmaps, you will have to search half of them on average for each lookup. If the smaller hashmaps have the same load factor that the larger map would have had, you will increase the total number of comparisons by a factor of approximately N/2.
And if the smaller hashmaps have a smaller load factor, you are wasting memory.
All that is assuming you distribute the keys randomly between the smaller hashmaps. If you distribute them according to some function of the key (e.g. a string prefix) then what you have created is a trie, which is efficient for some applications (e.g. auto-complete in web forms.)
Are these maps used in logically distinct places? For instance, I wouldn't have one map containing users, cached query results, loggers etc, just because you happen to know the keys won't clash. However, I equally wouldn't split up a single map into multiple maps.
Keep one hashmap for each logical mapping from key to value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With