What is the difference between Load factor and Space utilization in a Hashtable? Please, someone explain!
Load factor
Definition:
The load factor of a Hashtable is the ratio of elements to buckets. Smaller load factors cause faster average lookup times at the cost of increased memory consumption. The default load factor of 1.0 generally provides the best balance between speed and size.
In other words, too small load factor
will lead to faster access to the elements (while finding a given element, or iterating, ...) of the HashTable but also requires more memory usage.
In the contrary, high load factor will be slower (in average), with less memory usage.
A bucket
holds a certain number of items.
Sometimes each location in the table is a bucket which will hold a fixed number of items, all of which hashed to this same location. This speeds up lookups because there is probably no need to go look at another location.
n/prime
, where n
is the number of items in the table and prime
is the size of the table. Thus a load factor of 1 means that the table is full.
Here is an example of benchmark (here realized in the conditions of a large prime
number):
load --- successful lookup --- --- unsuccessful lookup ---
factor linear double linear double
------------------------------------------------------------------------
0.50 1.50 1.39 2.50 2.00
0.75 2.50 1.85 8.50 4.00
0.90 5.50 2.56 50.50 10.00
0.95 10.50 3.15 200.50 20.00
Table source.
1+lf/2
, where lf
is the load factor. Because each table location holds a linked list, which can contain a number of items, the load factor can be greater than 1, whereas 1 is the maximum possible in an ordinary hash table.
Space utilization
The idea is that we store records of data in the hash table. Each record has a key
field and an associated data
field. The record is stored in a location that is based on its key. The function that produces this location for each given key is called a hash function
.
Let's suppose that each key field contains an integer and the data field a string (array of characters type of string). One possible hash function is hash(key) = key % prime
.
Definition:
The space utilization would be the ratio of the number of full used buckets (relatively to the load factor) to the total number of buckets reserved in the hash table.
For technical reasons, a prime number of buckets works better, which (modulus the number of filly used buckets) can consists in a waste of memory.
Conclusion : Rather than having to proceed through a linear search, or a binary search, a hash table will usually complete a lookup after just one comparison! Sometimes, however, two comparisons (or even more) are needed. A hash table thus delivers (almost) the ideal lookup time. The trade-off is that to get this great lookup time memory space is wasted.
As you can see, I am no expert, and I'm getting information while writing this, so any comment is welcome to make this more accurate or less... well... wrong...
I switched it in Community Wiki mode (Feel free to improve)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With