Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient Hashmap Use

What is the more efficient approach for using hashmaps?

A) Use multiple smaller hashmaps, or

B) store all objects in one giant hashmap?

(Assume that the hashing algorithm for the keys is fairly efficient, resulting in few collisions)

CLARIFICATION: Option B implies segregation by primary key -- i.e. no additional lookup is necessary to determine which actual hashmap to use. (For example, if the lookup keys are alphanumeric, Hashmap 1 stores the A's, Hashmap 2 stores B's, and so on.)

like image 855
Tony the Pony Avatar asked Aug 01 '09 14:08

Tony the Pony


2 Answers

Definitely B. The advantage of hash tables is that the average number of comparisons per lookup is independent of the size.

If you split your map into N smaller hashmaps, you will have to search half of them on average for each lookup. If the smaller hashmaps have the same load factor that the larger map would have had, you will increase the total number of comparisons by a factor of approximately N/2.

And if the smaller hashmaps have a smaller load factor, you are wasting memory.

All that is assuming you distribute the keys randomly between the smaller hashmaps. If you distribute them according to some function of the key (e.g. a string prefix) then what you have created is a trie, which is efficient for some applications (e.g. auto-complete in web forms.)

like image 101
finnw Avatar answered Sep 28 '22 09:09

finnw


Are these maps used in logically distinct places? For instance, I wouldn't have one map containing users, cached query results, loggers etc, just because you happen to know the keys won't clash. However, I equally wouldn't split up a single map into multiple maps.

Keep one hashmap for each logical mapping from key to value.

like image 20
Jon Skeet Avatar answered Sep 28 '22 07:09

Jon Skeet