I am trying to solve this algorithmic problem:
https://dunjudge.me/analysis/problems/469/
For convenience, I have summarized the problem statement below.
Given an array of length (<= 2,000,000) containing integers in the range [0, 1,000,000], find the longest subarray that contains a majority element.
A majority element is defined as an element that occurs > floor(n/2) times in a list of length n.
Time limit: 1.5s
For example:
If the given array is [1, 2, 1, 2, 3, 2],
The answer is 5 because the subarray [2, 1, 2, 3, 2] of length 5 from position 1 to 5 (0-indexed) has the number 2 which appears 3 > floor(5/2) times. Note that we cannot take the entire array because 3 = floor(6/2).
The first thing that comes to mind is an obvious brute force (but correct) solution which fixes the start and end indexes of a subarray and loop through it to check if it contains a majority element. Then we take the length of the longest subarray that contains a majority element. This works in O(n^2) with a small optimization. Clearly, this will not pass the time limit.
I was also thinking of dividing the elements into buckets that contain their indexes in sorted order.
Using the example above, these buckets would be:
1: 0, 2
2: 1, 3, 5
3: 4
Then for each bucket, I would make an attempt to merge the indexes together to find the longest subarray that contains k as the majority element where k is the integer label of that bucket. We could then take the maximum length over all values of k. I didn't try out this solution as I didn't know how to perform the merging step.
Edit:
I solved this problem thanks to the answers of PhamTrung and hk6279. Although I accepted the answer from PhamTrung because he first suggested the idea, I highly recommend looking at the answer by hk6279 because his answer elaborates the idea of PhamTrung and is much more detailed (and also comes with a nice formal proof!).
Naive Approach: Follow the steps below to solve the problem using this approach: Consider all sub-arrays one by one and check the sum of every sub-array. If the sum of the current subarray is equal to zero then update the maximum length accordingly.
Explanation: Subarrays {1, 1, 2}, {4, 4, 4, 5, 5, 4, 5} and {8, 9} are the only subarray having difference between any two distinct values equal to K( = 1). The longest subarray among them is {4, 4, 4, 5, 5, 4, 5}. Hence, the length is 7.
Note: attempt 1 is wrong as @hk6279 has given a counter example. Thanks for pointing it out.
Attempt 1: The answer is quite complex, so I will discuss a brief idea
Let process each unique number one by one.
Processing each occurrence of number x
from left to right, at index i
, let add an segment (i, i)
indicates the start and end of the current subarray. After that, we need to look to the left side of this segment, and try to merge the left neighbour of this segment into (i, i)
, (So, if the left is (st, ed)
, we try to make it become (st, i)
if it satisfy the condition) if possible, and continue to merge them until we are not able to merge, or there is no left neighbour.
We keep all those segments in a stack for faster look up/add/remove.
Finally, for each segment, we try to enlarge them as large as possible, and keep the biggest result.
Time complexity should be O(n)
as each element could only be merged once.
Attempt 2:
Let process each unique number one by one
For each unique number x
, we maintain an array of counter. From 0 to end of the array, if we encounter a value x
we increase the count, and if we don't we decrease, so for this array
[0,1,2,0,0,3,4,5,0,0] and number 0
, we have this array counter
[1,0,-1,0,1,0,-1,-2,-1,0]
So, in order to make a valid subarray which ends at a specific index i
, the value of counter[i] - counter[start - 1]
must be greater than 0 (This can be easily explained if you view the array as making from 1 and -1 entries; with 1 is when there is an occurrence of x, -1 otherwise
; and the problem can be converted into finding the subarray with sum is positive)
So, with the help of a binary search, the above algo still have an complexity of O(n ^ 2 log n) (in case we have n/2 unique numbers, we need to do the above process n/2 times, each time take O (n log n))
To improve it, we make an observation that, we actually don't need to store all values for all counter, but just the values of counter of x
, we saw that we can store for above array counter:
[1,#,#,0,1,#,#,#,-1,0]
This will leads to O (n log n) solution, which only go through each element once.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With