Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run time to insert n elements into an empty hash table

People say it takes amortized O(1) to put into a hash table. Therefore, putting n elements must be O(n). That's not true for large n, however, since as an answerer said, "All you need to satisfy expected amortized O(1) is to expand the table and rehash everything with a new random hash function any time there is a collision."

So: what is the average running-time of inserting n elements into a hash table? I realize this is probably implementation-dependent, so mention what type of implementation you're talking about.

For example, if there are (log n) equally spaced collisions, and each collision takes O(k) to resolve, where k is the current size of the hashtable, then you'd have this recurrence relation:

T(n) = T(n/2) + n/2 + n/2

(that is, you take the time to insert n/2 elements, then you have a collision, taking n/2 to resolve, then you do the remaining n/2 inserts without a collision). This still ends up being O(n), so yay. But is this reasonable?

like image 729
Claudiu Avatar asked Dec 22 '22 11:12

Claudiu


2 Answers

It completely depends on how inefficient your rehashing is. Specifically, if you can properly estimate the expected size of your hashtable the second time, your runtime still approaches O(n). Effectively, you have to specify how inefficient your rehash size calculation is before you can determine the expected order.

like image 154
Paul Sonier Avatar answered Jan 28 '23 10:01

Paul Sonier


People say it takes amortized O(1) to put into a hash table.

From a theoretical standpoint, it is expected amortized O(1).

Hash tables are fundamentally a randomized data structure, in the same sense that quicksort is a randomized algorithm. You need to generate your hash functions with some randomness, or else there exist pathological inputs which are not O(1).

You can achieve expected amortized O(1) using dynamic perfect hashing:

The naive idea I originally posted was to rehash with a new random hash function on every collision. (See also perfect hash functions) The problem with this is that this requires O(n^2) space, from birthday paradox.

The solution is to have two hash tables, with the second table for collisions; resolve collisions on that second table by rebuilding it. That table will have O(\sqrt{n}) elements, so would grow to O(n) size.

In practice you often just use a fixed hash function because you can assume (or don't care if) your input is pathological, much like you often quicksort without prerandomizing the input.

like image 31
Captain Segfault Avatar answered Jan 28 '23 10:01

Captain Segfault