I'm trying to understand open addressing in hash tables but there is one question which isn't answered in my literature. It concerns the deletion of elements in such a hash table if quadratic probing is used. Then the removed element is replaced by a sentinel element. The get() operation then knows that it has to go further and the add() method would overwrite the first sentinel it finds. But what happens if I want to add an element with a key that is already in the hash table but behind a sentinel in a probing path? Instead of overwriting the value of the instance with the same key which is already in the table, the add() method would overwrite the sentinel. And then we have multiple elements with the same key in the hash table. I see that as a problem since it costs memory and also since removing the element from the hash table would merely remove the first of them, so that the element could still be found in the table (i.e. it is not removed).
So it seems that it is necessary to search the whole probing path for the key of the element one wants to insert before replacing a sentinel element. Am I overlooking something? How is this problem handled in practice?
But what happens if I want to add an element with a key that is already in the hash table but behind a sentinel in a probing path? Instead of overwriting the value of the instance with the same key which is already in the table, the add() method would overwrite the sentinel.
add()
has to check every element after the sentinel(s) in the probing path till it finds an empty element, as you pointed out later. If it could not find the new element in the probing path and there are sentinel elements on it, it can use the first sentinel slot to store the new element.
There is a hash table implementation on http://www.algolist.net/Data_structures/Hash_table/Open_addressing (HashMap.java). Its put()
method does exactly this. (The collision resolution is linear probing in the referenced snippet but I don't think it's an important difference from the point of view of the algorithm.)
After a lot of remove operations there could be too many sentinel elements in the table. A solution for this would be to rebuild the hash table (i.e. rehash everything) occasionally (based on the number of items and the number of sentinel elements). This operation would eliminate the sentinel elements.
Another approach is eliminating the sentinel (DELETED) element from the probing path when you remove an element. Practically, you don't have sentinel elements in the table in this case; there are only FREE and OCCUPIED slots. It could be expensive.
So it seems that it is necessary to search the whole probing path for the key of the element one wants to insert before replacing a sentinel element.
Yes, it is. You have to search until you find an empty element.
How is this problem handled in practice?
I don't know too much about real life hash table implementations. I suppose plenty of them are available on the internet in open source projects. I've just checked the Hashtable
and HashMap
classes in Java. Both use chaining instead of open addressing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With