I'm looking for a data structure with which I can find the most frequently occuring number (among an array of numbers) in a given, variable range.
Let's consider the following 1 based array:
1 2 3 1 1 3 3 3 3 1 1 1 1
If I query the range (1,4), the data structure must retun 1, which occurs twice. Several other examples:
(1,13) = 1
(4,9) = 3
(2,2) = 2
(1,3) = 1 (all of 1,2,3 occur once, so return the first/smallest one. not so important at the moment)
I have searched, but could not find anything similar. I'm looking (ideally) a data structure with minimal space requirement, fast preprocessing, and/or query complexities.
Thanks in advance!
Let N be the size of the array and M the number of different values in that array.
I'm considering two complexities : pre-processing and querying an interval of size n, each must be spacial and temporal.
No pre-processing, we look at all values of the interval and find the most frequent one.
For each position of the array, we have an accumulative array that gives us for each value x, how many times x is in the array before that position.
Given an interval we just need for each x to subtract 2 values to find the number of x in that interval. We iterate over each x and find the maximum value. If n < M we iterate over each value of the interval, otherwise we iterate over all possible values for x.
For each value x build a binary heap of all the position in the array where x is present. The key in your heap is the position but you also store the total number of x between this position and the begin of the array.
Given an interval we just need for each x to subtract 2 values to find the number of x in that interval : in O(log(N)) we can ask the x's heap to find the two positions just before the start/end of the interval and substract the numbers. Basically it needs less space than a histogram but the query in now in O(log(N)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With