Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do hashtables use a linked list over an array for the bucket?

When a value is hashed to the same value, it gets added to a linked list referenced by the hash value. Why do implementations of hashtables use a linkedlist over an array as the bucket?

Is it because an array has a predetermined size when it's initialized so you would need to resize when too many elements are added to the bucket?

like image 657
user1099123 Avatar asked Nov 28 '12 00:11

user1099123


2 Answers

Yes: generally, it's because an array has a predetermined size. There's no requirement that you use a linked list or an array for the buckets; some crafty implementations use another hash table, which then uses a linked list or array for its buckets!

If you use an array, the hash table has a predetermined size for each of the array elements. Every possible bucket is allocated, and your hash table can be pretty big. That might be OK if you have lots of memory, or if you expect a very full hash table. You can mitigate this by holding a pointer to the array and allocating it as-needed.

An array can be indexed, so you might keep the array sorted. Then, if it becomes large, you can do a binary search to find the key you wanted.

If you use a linked list, you have to walk the linked list to find match you want linearly. That's not too efficient, but it minimizes memory usage.

As with all data structure problems, you have to consider what access patterns you'll have and how you'll use and fill the structure; what are the tradeoffs that you want to win, and which are the ones you don't care so much about?

like image 99
MikeB Avatar answered Oct 19 '22 22:10

MikeB


They don’t.

It is an over generalization to claim that “implementations of hashtables” use a linked list. Java does. Many other languages don’t. For example, Python uses open hashing, see the answers on to this questions, How are Python's Built In Dictionaries Implemented

Generally, designers of general purpose APIs are facing a very tough choice as they do not know the use case of their users. There are different implementation choices with different trade-offs, for example if you only ever add elements but never remove, different choices apply than for a hashmap which is mutated often. Et cetera.

like image 37
akuhn Avatar answered Oct 19 '22 22:10

akuhn