Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

O(1) hash look ups?

Tags:

c#

.net

hash

I ran across an assertion that HashSet<T>.Contains() is an O(1) operation. This surprised me since every discussion of hashing I've encountered mentions the possibility of collisions, potentially leading to O(n) run time.

Being curious, I looked into documentation for HashSet<T>.Contains and also HashTable.Contains. The documentation for both methods make that same claim.

When I look in reflector, HashSet<T>.Contains() is implemented with a for loop, going through a list of slots containing values which have the same hash.

Now admittedly, those same discussions of hashing have also mentioned that a good hashing algorithm avoids collisions and under those circumstances the lookup will indeed be O(1). But my understanding of Big O notation is that it's the worst-case runtime time, not best.

So is the O(1) claim incorrect? Or am I missing something?

like image 373
ThatBlairGuy Avatar asked Jul 21 '10 17:07

ThatBlairGuy


4 Answers

But my understanding of Big O notation is that it's the worst-case runtime time, not best.

Unfortunately, there is no "standard" for Big-O when describing algorithms. Often, it's used to describe the general or average case - not the worst case.

From Wikipedia:

...this notation is now frequently also used in the analysis of algorithms to describe an algorithm's usage of computational resources: the worst case or average case...

In this case, it's describing a standard case, given proper hashing. If you have proper hashing in place, the limiting behavior will be constant for size N, hence O(1).

like image 165
Reed Copsey Avatar answered Oct 31 '22 00:10

Reed Copsey


In general, it's O(1).

like image 35
SLaks Avatar answered Oct 30 '22 23:10

SLaks


For a properly implemented hash table, look ups have amortized constant time complexity.

In practice, a single look up can be O(n) in case of collisions, as you say. However, if you perform a large number of look ups the average time complexity per operation is constant.

Quoting wikipedia:

Amortized analysis differs from average-case performance in that probability is not involved; amortized analysis guarantees the time per operation over worst-case performance.

The method requires knowledge of which series of operations are possible. This is most commonly the case with data structures, which have state that persists between operations. The basic idea is that a worst case operation can alter the state in such a way that the worst case cannot occur again for a long time, thus "amortizing" its cost.

like image 34
Staffan Avatar answered Oct 31 '22 01:10

Staffan


No, Big O does not define "worst case", it defines a limit. Hash-based lookups (with good hashing algorithms that provide efficient value distribution and a low collision rate) progress toward a constant value as the number of items increases (they will never reach or that constant value, but that's the point of it being a limit).

like image 20
Adam Robinson Avatar answered Oct 31 '22 00:10

Adam Robinson