Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashtable Implementation

I was recently asked 'how would you implement a hastable'. I know the hashing algorithm is critical as the less collisions the better WRT performance, but what algorithm/data structure should be employed to deliver amortized constant time {O(1)} for insert/delete/lookups?

like image 704
Myles McDonnell Avatar asked Feb 23 '12 08:02

Myles McDonnell


People also ask

How is Java Hashtable implemented?

The Hashtable class implements a hash table, which maps keys to values. Any non-null object can be used as a key or as a value. To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method. It is similar to HashMap, but is synchronized.

How is a hash map implemented?

Hashmap uses the array of Nodes(named as table), where Node has fields like the key, value (and much more). Here the Node is represented by class HashMapEntry. Basically, HashMap has an array where the key-value data is stored. It calculates the index in the array where the Node can be placed and it is placed there.

What is hashing and how it can be implemented?

Hashing is the process of transforming any given key or a string of characters into another value. This is usually represented by a shorter, fixed-length value or key that represents and makes it easier to find or employ the original string. The most popular use for hashing is the implementation of hash tables.


1 Answers

Hash tables have two main possibilities:

  1. Open Addressing, which is a simple array [dynamic array actualy if you can let your table grow on the fly]. Once a conflict has met - you need to use a second hash function to find the next entree that the element will be mapped to. Note this solution has some troubles [which can be solved] when your hash table also allows deletions. [Special mark for "deleted" entrees]
  2. Chaining - in this solution, each entree in the array is a linked list - containig all elements hashed to this entree. In here - all elements mapped to a certain value are in the list.

The important part about hash tables [in both solutions] in order to allow armotorized O(1) insert/del/look up - is allocating a bigger table and rehashing once a pre defined load factor was reached.

EDIT: complexity analsis:
Assume a load factor of p for some p < 1.

  1. The probability of "collision" in each access is p Thus the mean of array accesses is: Sigma(i * p^(i-1) * (1-p)) for each i in [1,n] This gives you: Sigma(i * p^(i-1) * (1-p)) = (1-p) * Sigma(i * p^(i-1)) <= (1-p) * 1/(p-1)^2 = 1-p = CONST. [have a look at the correctness of Sigma(i * p^(i-1)) < 1/(p-1)^2 in wolfram alpha]. Thus resulting on average a constant number of accesses to the array. Also: You might need to rehash all elements after the load factor was reached, resulting in O(n) accesses to the array. This results in n * O(1) ops [adding n elements] and 1 * O(n) op [rehashing], so you get: (n * O(1) + 1 * O(n)) / n = O(1) armotorized time.
  2. Very similar to (1), but the analysis is done on list accesses. Each operation requires exactly one array accesses, and a variant number of list accesses - with the same analysis as before.
like image 156
amit Avatar answered Nov 03 '22 00:11

amit