Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How large should a hashtable be initialized related to the entries count?

Is there an optimal size for a hashtable related to the entry count?

So for entries = n is there an optimal (or recommended) size s for the hashtable which depends on n? Lets say 2n (double the entries count) or some other value?

Is it depending on the internal structure (hash function, bucket size, etc.)? Please provide some evidence when claiming something.

like image 762
MicSim Avatar asked May 20 '11 12:05

MicSim


People also ask

What should the size of my hash table be?

But a good general “rule of thumb” is: The hash table should be an array with length about 1.3 times the maximum number of keys that will actually be in the table, and. Size of hash table array should be a prime number.

What is the initial capacity of Hashtable?

Constructs a new, empty hashtable with a default initial capacity (11) and load factor (0.75).

Why should the size of a hash table be prime?

They famously are only divisible by 1 and themselves. Thus, choosing to set your hash table length to a large prime number will greatly reduce the occurrence of collisions.

What does bucket entries mean in the context of a Hashtable?

A bucket is simply a fast-access location (like an array index) that is the the result of the hash function. The idea with hashing is to turn a complex input value into a different value which can be used to rapidly extract or store data.


2 Answers

The ratio between the size of the table and the number of entries is called the load factor of a hash table.

The load factor crucially determines the expected runtime behaviour. For the usual bounds (i.e. expected time O(1) on all operations) to apply, it has to be smaller than 1.

In practice, the remark by Pete Wilson applies: one tries to keep the load factor close to 1 in order not to waste space; a prime number size for the table is often used to improve the collision characteristics of the hash function – but other strategies exist.

like image 55
Konrad Rudolph Avatar answered Sep 21 '22 17:09

Konrad Rudolph


In java, with class HashTable, the default load factor (.75) offers a good tradeoff between time and space costs.

A higher load factor value decreases the space requirements and increases the odds of a collision. A collision increases the amount of time needed to perform a get() and put(...).

A lower load factor value increases disk/memory space requirements, causing lots of reserved space that is permanently unused. The increased number of bins decreases the odds of a collision.

So a load factor of (.75) means the HashTable bins are 75% full. If you have 75 elements to store, the number of bins in your HashTable should be 100.

Therefore, answering your question, given N as the number of items to store in your HashTable, the size of your HashTable should be about (1.33 * n). Other circumstances may make different load factors faster in some situations.

http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Hashtable.html

like image 22
Eric Leschinski Avatar answered Sep 21 '22 17:09

Eric Leschinski