Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Space utilization and Load factor in hashtable

What is the difference between Load factor and Space utilization in a Hashtable? Please, someone explain!

like image 773
Student56 Avatar asked Nov 12 '22 02:11

Student56


1 Answers

Load factor

Definition:

The load factor of a Hashtable is the ratio of elements to buckets. Smaller load factors cause faster average lookup times at the cost of increased memory consumption. The default load factor of 1.0 generally provides the best balance between speed and size.

In other words, too small load factor will lead to faster access to the elements (while finding a given element, or iterating, ...) of the HashTable but also requires more memory usage.

In the contrary, high load factor will be slower (in average), with less memory usage.

A bucket holds a certain number of items.

Sometimes each location in the table is a bucket which will hold a fixed number of items, all of which hashed to this same location. This speeds up lookups because there is probably no need to go look at another location.

  • Linear probing as well as double hashing : The load factor is defined as n/prime, where n is the number of items in the table and prime is the size of the table. Thus a load factor of 1 means that the table is full.

Here is an example of benchmark (here realized in the conditions of a large prime number):

load        --- successful lookup ---        --- unsuccessful lookup ---
factor         linear       double              linear         double
------------------------------------------------------------------------
0.50           1.50         1.39                2.50           2.00
0.75           2.50         1.85                8.50           4.00
0.90           5.50         2.56               50.50          10.00 
0.95          10.50         3.15              200.50          20.00

Table source.

  • Some hash tables use other collision-resolution schemes : For example, in separate chaining, where items that hash to the same location are stored in a linked list, lookup time is measured by the number of list nodes that have to be examined. For a successful search, this number is 1+lf/2, where lf is the load factor. Because each table location holds a linked list, which can contain a number of items, the load factor can be greater than 1, whereas 1 is the maximum possible in an ordinary hash table.


Space utilization

The idea is that we store records of data in the hash table. Each record has a key field and an associated data field. The record is stored in a location that is based on its key. The function that produces this location for each given key is called a hash function.

Let's suppose that each key field contains an integer and the data field a string (array of characters type of string). One possible hash function is hash(key) = key % prime.

Definition:

The space utilization would be the ratio of the number of full used buckets (relatively to the load factor) to the total number of buckets reserved in the hash table.

For technical reasons, a prime number of buckets works better, which (modulus the number of filly used buckets) can consists in a waste of memory.



Conclusion : Rather than having to proceed through a linear search, or a binary search, a hash table will usually complete a lookup after just one comparison! Sometimes, however, two comparisons (or even more) are needed. A hash table thus delivers (almost) the ideal lookup time. The trade-off is that to get this great lookup time memory space is wasted.


As you can see, I am no expert, and I'm getting information while writing this, so any comment is welcome to make this more accurate or less... well... wrong...

         I switched it in Community Wiki mode (Feel free to improve)
like image 111
14 revs, 2 users 99% Avatar answered Dec 09 '22 10:12

14 revs, 2 users 99%