I have read about hashtable and open adrdessing. If you want to insert the keys: 18,32,44 in an hashtable with size 13:
18 gets index 5 (18 modulus 13 = 5)
32 gets index 6 (32 modulus 13 = 6)
44 gets index 5 (44 modulus 13 = 5)
You'll get a collision because there are already something on index 5.
If you use linear probing you'll do hashfunction = (key+i) modulus N
where i = 0,1,2..
until you find an empty place in the hashtable. Then 44 will get be inserted at index 7.
What if you delete 32, and then you want to delete 44. You start by looking at hashfunction(44)=5
- that was not 44, then hashfunction(44 + 1) = 6
- that is empty. Then you might think that 44 is gone. How do you mark a place in the hashtable, that the place is not really empty, but does not contain a key, and that you should keep looking for 44 at the next index?
If you then need to insert another key at index 6 then the key just overwrites the "mark" in the hashtable.
What could you use to mark an index - saying here has been an key, but has been deleted - so you continue to look at next index? You can't just write null or 0 because then either you think the key has been deleted (null) or that an key with value 0 has overwritten 44.
One way to handle hash tables using open addressing is to use state marks: EMPTY
, OCCUPIED
and DELETED
. Note that there's an important distinction between EMPTY
, which means the position has never been used and DELETED
, which means it was used but got deleted.
When a value gets removed, the slot is marked as DELETED
, not EMPTY
. When you try to retrieve a value, you'll probe until you find a slot that's mark EMPTY
; eg: you consider DELETED
slots to be the same as OCCUPIED
. Note that insertion can ignore this distinction - you can insert into a DELETED
or EMPTY
slot.
The question is tagged Java, which is a bit misleading because Java (or at least Oracle's implementation of it) does not use open addressing. Open addressing gets specially problematic when the load factor gets high, which causes hash collisions to occur much more often:
As you can see, there's a dramatic performance drop near the 0.7 mark. Most hashtables get resized once their load factor gets past a certain constant factor. Java for example doubles the size of its HashMap
when the load factor gets past 0.75.
It seems like you are trying to implement your own hash table (in contrast to using the Hashtable or HashMap included in java), so it's more a data structure question than a java question.
That being said, implementing a hash table with open addressing (such as linear probing) is not very efficient when it comes to removing elements. The normal solution is to "pull up" all elements that are in the wrong slot so there won't be any spaces in the probing.
There is some pseudo code describing this quite well at wikipedia:
http://en.wikipedia.org/wiki/Open_addressing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With