Handling hash collisions when using linear probing

Question

I have read about hashtable and open adrdessing. If you want to insert the keys: 18,32,44 in an hashtable with size 13:

18 gets index 5 (18 modulus 13 = 5)
32 gets index 6 (32 modulus 13 = 6)
44 gets index 5 (44 modulus 13 = 5)

You'll get a collision because there are already something on index 5.

If you use linear probing you'll do hashfunction = (key+i) modulus N where i = 0,1,2.. until you find an empty place in the hashtable. Then 44 will get be inserted at index 7.

What if you delete 32, and then you want to delete 44. You start by looking at hashfunction(44)=5 - that was not 44, then hashfunction(44 + 1) = 6 - that is empty. Then you might think that 44 is gone. How do you mark a place in the hashtable, that the place is not really empty, but does not contain a key, and that you should keep looking for 44 at the next index?

If you then need to insert another key at index 6 then the key just overwrites the "mark" in the hashtable.

What could you use to mark an index - saying here has been an key, but has been deleted - so you continue to look at next index? You can't just write null or 0 because then either you think the key has been deleted (null) or that an key with value 0 has overwritten 44.

NullUserException · Accepted Answer

One way to handle hash tables using open addressing is to use state marks: EMPTY, OCCUPIED and DELETED. Note that there's an important distinction between EMPTY, which means the position has never been used and DELETED, which means it was used but got deleted.

When a value gets removed, the slot is marked as DELETED, not EMPTY. When you try to retrieve a value, you'll probe until you find a slot that's mark EMPTY; eg: you consider DELETED slots to be the same as OCCUPIED. Note that insertion can ignore this distinction - you can insert into a DELETED or EMPTY slot.

The question is tagged Java, which is a bit misleading because Java (or at least Oracle's implementation of it) does not use open addressing. Open addressing gets specially problematic when the load factor gets high, which causes hash collisions to occur much more often:

enter image description here

As you can see, there's a dramatic performance drop near the 0.7 mark. Most hashtables get resized once their load factor gets past a certain constant factor. Java for example doubles the size of its HashMap when the load factor gets past 0.75.

Albin · Answer

It seems like you are trying to implement your own hash table (in contrast to using the Hashtable or HashMap included in java), so it's more a data structure question than a java question.

That being said, implementing a hash table with open addressing (such as linear probing) is not very efficient when it comes to removing elements. The normal solution is to "pull up" all elements that are in the wrong slot so there won't be any spaces in the probing.

There is some pseudo code describing this quite well at wikipedia:

http://en.wikipedia.org/wiki/Open_addressing

Handling hash collisions when using linear probing

Tags:

java

hashtable

Anne

2 Answers

NullUserException

Albin

Recent Activity

Donate For Us

Handling hash collisions when using linear probing

Tags:

java

hashtable

Anne

2 Answers

NullUserException

Albin

Related questions

Recent Activity

Donate For Us