Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining the element that occurred the most in O(n) time and O(1) space

Tags:

java

c

algorithm

Let me start off by saying that this is not a homework question. I am trying to design a cache whose eviction policy depends on entries that occurred the most in the cache. In software terms, assume we have an array with different elements and we just want to find the element that occurred the most. For example: {1,2,2,5,7,3,2,3} should return 2. Since I am working with hardware, the naive O(n^2) solution would require a tremendous hardware overhead. The smarter solution of using a hash table works well for software because the hash table size can change but in hardware, I will have a fixed size hash table, probably not that big, so collisions will lead to wrong decisions. My question is, in software, can we solve the above problem in O(n) time complexity and O(1) space?

like image 228
Keeto Avatar asked Apr 24 '14 06:04

Keeto


1 Answers

There can't be an O(n) time, O(1) space solution, at least not for the generic case.

As amit points out, by solving this, we find the solution to the element distinctness problem (determining whether all the elements of a list are distinct), which has been proven to take Θ(n log n) time when not using elements to index the computer's memory. If we were to use elements to index the computer's memory, given an unbounded range of values, this requires at least Θ(n) space. Given the reduction of this problem to that one, the bounds for that problem enforces identical bounds on this problem.

However, practically speaking, the range would mostly be bounded, if for no other reason than the type one typically uses to store each element in has a fixed size (e.g. a 32-bit integer). If this is the case, this would allow for an O(n) time, O(1) space solution, albeit possibly too slow and using too much space due to the large constant factors involved (as the time and space complexity would depend on the range of values).

2 options:

  • Counting sort

    Keeping an array of the number of occurrences of each element (the array index being the element), outputting the most frequent.

    If you have a bounded range of values, this approach would be O(1) space (and O(n) time). But technically so would the hash table approach, so the constant factors here is presumably too large for this to be acceptable.

    Related options are radix sort (has an in-place variant, similar to quicksort) and bucket sort.

  • Quicksort

    Repeatedly partitioning the data based on a selected pivot (through swapping) and recursing on the partitions.

    After sorting we can just iterate through the array, keeping track of the maximum number of consecutive elements.

    This would take O(n log n) time and O(1) space.

like image 95
Bernhard Barker Avatar answered Oct 13 '22 21:10

Bernhard Barker