Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure that supports range based most frequently occuring element query

I'm looking for a data structure with which I can find the most frequently occuring number (among an array of numbers) in a given, variable range.

Let's consider the following 1 based array:

1 2 3 1 1 3 3 3 3 1 1 1 1

If I query the range (1,4), the data structure must retun 1, which occurs twice. Several other examples:

(1,13) = 1

(4,9) = 3

(2,2) = 2

(1,3) = 1 (all of 1,2,3 occur once, so return the first/smallest one. not so important at the moment)

I have searched, but could not find anything similar. I'm looking (ideally) a data structure with minimal space requirement, fast preprocessing, and/or query complexities.

Thanks in advance!

like image 520
kolistivra Avatar asked Nov 05 '22 08:11

kolistivra


1 Answers

Let N be the size of the array and M the number of different values in that array.

I'm considering two complexities : pre-processing and querying an interval of size n, each must be spacial and temporal.


Solution 1 :
  • Spacial : O(1) and O(M)
  • Temporal : O(1) and O(n + M)

No pre-processing, we look at all values of the interval and find the most frequent one.


Solution 2 :
  • Spacial : O(M*N) and O(1)
  • Temporal : O(M*N) and O(min(n,M))

For each position of the array, we have an accumulative array that gives us for each value x, how many times x is in the array before that position.

Given an interval we just need for each x to subtract 2 values to find the number of x in that interval. We iterate over each x and find the maximum value. If n < M we iterate over each value of the interval, otherwise we iterate over all possible values for x.


Solution 3 :
  • Spacial : O(N) and O(1)
  • Temporal : O(N) and O(min(n,M)*log(n))

For each value x build a binary heap of all the position in the array where x is present. The key in your heap is the position but you also store the total number of x between this position and the begin of the array.

Given an interval we just need for each x to subtract 2 values to find the number of x in that interval : in O(log(N)) we can ask the x's heap to find the two positions just before the start/end of the interval and substract the numbers. Basically it needs less space than a histogram but the query in now in O(log(N)).

like image 175
Loïc Février Avatar answered Nov 09 '22 06:11

Loïc Février