Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the longest subarray that contains a majority element

I am trying to solve this algorithmic problem:

https://dunjudge.me/analysis/problems/469/

For convenience, I have summarized the problem statement below.

Given an array of length (<= 2,000,000) containing integers in the range [0, 1,000,000], find the longest subarray that contains a majority element.

A majority element is defined as an element that occurs > floor(n/2) times in a list of length n.

Time limit: 1.5s

For example:

If the given array is [1, 2, 1, 2, 3, 2],

The answer is 5 because the subarray [2, 1, 2, 3, 2] of length 5 from position 1 to 5 (0-indexed) has the number 2 which appears 3 > floor(5/2) times. Note that we cannot take the entire array because 3 = floor(6/2).


My attempt:

The first thing that comes to mind is an obvious brute force (but correct) solution which fixes the start and end indexes of a subarray and loop through it to check if it contains a majority element. Then we take the length of the longest subarray that contains a majority element. This works in O(n^2) with a small optimization. Clearly, this will not pass the time limit.

I was also thinking of dividing the elements into buckets that contain their indexes in sorted order.

Using the example above, these buckets would be:

1: 0, 2

2: 1, 3, 5

3: 4

Then for each bucket, I would make an attempt to merge the indexes together to find the longest subarray that contains k as the majority element where k is the integer label of that bucket. We could then take the maximum length over all values of k. I didn't try out this solution as I didn't know how to perform the merging step.


Could someone please advise me on a better approach to solve this problem?

Edit:

I solved this problem thanks to the answers of PhamTrung and hk6279. Although I accepted the answer from PhamTrung because he first suggested the idea, I highly recommend looking at the answer by hk6279 because his answer elaborates the idea of PhamTrung and is much more detailed (and also comes with a nice formal proof!).

like image 497
Donald Avatar asked Jan 16 '18 03:01

Donald


People also ask

How do you find the length of a longest Subarray?

Naive Approach: Follow the steps below to solve the problem using this approach: Consider all sub-arrays one by one and check the sum of every sub-array. If the sum of the current subarray is equal to zero then update the maximum length accordingly.

What is the length of longest Subarray that contains up to two distinct integers?

Explanation: Subarrays {1, 1, 2}, {4, 4, 4, 5, 5, 4, 5} and {8, 9} are the only subarray having difference between any two distinct values equal to K( = 1). The longest subarray among them is {4, 4, 4, 5, 5, 4, 5}. Hence, the length is 7.


1 Answers

Note: attempt 1 is wrong as @hk6279 has given a counter example. Thanks for pointing it out.

Attempt 1: The answer is quite complex, so I will discuss a brief idea

Let process each unique number one by one.

Processing each occurrence of number x from left to right, at index i, let add an segment (i, i) indicates the start and end of the current subarray. After that, we need to look to the left side of this segment, and try to merge the left neighbour of this segment into (i, i), (So, if the left is (st, ed), we try to make it become (st, i) if it satisfy the condition) if possible, and continue to merge them until we are not able to merge, or there is no left neighbour.

We keep all those segments in a stack for faster look up/add/remove.

Finally, for each segment, we try to enlarge them as large as possible, and keep the biggest result.

Time complexity should be O(n) as each element could only be merged once.

Attempt 2:

Let process each unique number one by one

For each unique number x, we maintain an array of counter. From 0 to end of the array, if we encounter a value x we increase the count, and if we don't we decrease, so for this array [0,1,2,0,0,3,4,5,0,0] and number 0, we have this array counter

[1,0,-1,0,1,0,-1,-2,-1,0]

So, in order to make a valid subarray which ends at a specific index i, the value of counter[i] - counter[start - 1] must be greater than 0 (This can be easily explained if you view the array as making from 1 and -1 entries; with 1 is when there is an occurrence of x, -1 otherwise; and the problem can be converted into finding the subarray with sum is positive)

So, with the help of a binary search, the above algo still have an complexity of O(n ^ 2 log n) (in case we have n/2 unique numbers, we need to do the above process n/2 times, each time take O (n log n))

To improve it, we make an observation that, we actually don't need to store all values for all counter, but just the values of counter of x, we saw that we can store for above array counter:

[1,#,#,0,1,#,#,#,-1,0]

This will leads to O (n log n) solution, which only go through each element once.

like image 118
Pham Trung Avatar answered Sep 17 '22 23:09

Pham Trung