Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why use an array to implement a "list" instead of a hash table?

Consider an array versus a hashtable where the keys are just the integral indexes of a list.

Their average-case insertion, lookup, and removal big-O bounds are all O(1) constant time. I understand that you may get some low-level wins in cache locality with an array, and there is a marginal (mostly-constant) overhead to the hashtable operations, but hashtables get you sparseness for free, which in some applications is a big win.

What other significant (or small) contrasts am I missing?

Context: I get into a discussion about this sometimes when interviewing programming candidates. Usually the context is "how would you implement the Javascript array type inside of the JS VM?" For densely-packed data, I side with a native array, but I want to have better reasoning than the intuition that "it seems less like overkill".

like image 797
Ben Zotto Avatar asked Feb 27 '23 14:02

Ben Zotto


2 Answers

An array is jsut a special case of a hash table where the hash function is very simple

f(x) := x;

and the modulo used is the same as the data word size (and thus the array size).

If you do not allow non unique values you do not need the "next" pointers and voila, we have an array.

Because of the absence of a complex hash function and modulo calculation, this is very fast, but only applicable when the array can be kept small (very large arrays with lot's of empty places waste memory resources and might trigger nasty things like swapping/trashing to disk).

like image 93
Ritsaert Hornstra Avatar answered Mar 10 '23 09:03

Ritsaert Hornstra


When you look at it from the angle of someone who wants to implement the pseudo array behaviour of Javascript, you are right that a hashtable is the better way to do this, esp. since Javascript arrays have no fixed length and need to be able to accomodate entries at any index. Arrays in Javascript just look like arrays but behave more like hashtables.

But in a language that is a bit closer to the machine the performance and space benefits of using a real array for data that can be stored efficiently in an array are quite noteworthy, especially since the benefits of using hashtables for this are pretty limited to sparse arrays which is not what you would or should use an array for. This is actually better done with hashtables that have integer keys.

Insertion, lookup, and removal is also O(1) for arrays in all cases but has a much lower constant O than hashtables (it's not just because of cache locality). And arrays need less space per entry. If you would like to remove and insert entries in a way that the following entries change their index accordingly, it would be O(n) with n beeing the number of entries that need to be moved, but it would also be O(n) for hashtables to do this and again with a much higher constant overhead. This is the kind of operation for which you better use a linked list. Also growing an array is less costly than growing a hashtable that might need to rehash all of it's entries.

All of the different collection types have their specific advantages and disadvantages. That's why there are so many of them in use.

like image 32
x4u Avatar answered Mar 10 '23 10:03

x4u