People say it takes amortized O(1) to put into a hash table. Therefore, putting n elements must be O(n). That's not true for large n, however, since as an answerer said, "All you need to satisfy expected amortized O(1) is to expand the table and rehash everything with a new random hash function any time there is a collision."
So: what is the average running-time of inserting n elements into a hash table? I realize this is probably implementation-dependent, so mention what type of implementation you're talking about.
For example, if there are (log n) equally spaced collisions, and each collision takes O(k) to resolve, where k is the current size of the hashtable, then you'd have this recurrence relation:
T(n) = T(n/2) + n/2 + n/2
(that is, you take the time to insert n/2 elements, then you have a collision, taking n/2 to resolve, then you do the remaining n/2 inserts without a collision). This still ends up being O(n), so yay. But is this reasonable?
It completely depends on how inefficient your rehashing is. Specifically, if you can properly estimate the expected size of your hashtable the second time, your runtime still approaches O(n). Effectively, you have to specify how inefficient your rehash size calculation is before you can determine the expected order.
People say it takes amortized O(1) to put into a hash table.
From a theoretical standpoint, it is expected amortized O(1).
Hash tables are fundamentally a randomized data structure, in the same sense that quicksort is a randomized algorithm. You need to generate your hash functions with some randomness, or else there exist pathological inputs which are not O(1).
You can achieve expected amortized O(1) using dynamic perfect hashing:
The naive idea I originally posted was to rehash with a new random hash function on every collision. (See also perfect hash functions) The problem with this is that this requires O(n^2) space, from birthday paradox.
The solution is to have two hash tables, with the second table for collisions; resolve collisions on that second table by rebuilding it. That table will have O(\sqrt{n}) elements, so would grow to O(n) size.
In practice you often just use a fixed hash function because you can assume (or don't care if) your input is pathological, much like you often quicksort without prerandomizing the input.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With