Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the maximum subsequence binary sets that have an equal number of 1s and 0s

Tags:

I found the following problem on the internet, and would like to know how I would go about solving it:

You are given an array ' containing 0s and 1s. Find O(n) time and O(1) space algorithm to find the maximum sub sequence which has equal number of 1s and 0s.

Examples:

  1. 10101010 - The longest sub sequence that satisfies the problem is the input itself
  2. 1101000 - The longest sub sequence that satisfies the problem is 110100
like image 411
AGeek Avatar asked Jun 29 '10 11:06

AGeek


1 Answers

Update.

I have to completely rephrase my answer. (If you had upvoted the earlier version, well, you were tricked!)

Lets sum up the easy case again, to get it out of the way:

Find the longest prefix of the bit-string containing an equal number of 1s and 0s of the array.

This is trivial: A simple counter is needed, counting how many more 1s we have than 0s, and iterating the bitstring while maintaining this. The position where this counter becomes zero for the last time is the end of the longest sought prefix. O(N) time, O(1) space. (I'm completely convinced by now that this is what the original problem asked for. )

Now lets switch to the more difficult version of the problem: we no longer require subsequences to be prefixes - they can start anywhere.

After some back and forth thought, I thought there might be no linear algorithm for this. For example, consider the prefix "111111111111111111...". Every single 1 of those may be the start of the longest subsequence, there is no candidate subsequence start position that dominates (i.e. always gives better solutions than) any other position, so we can't throw away any of them (O(N) space) and at any step, we must be able to select the best start (which has an equal number of 1s and 0s to the current position) out of linearly many candidates, in O(1) time. It turns out this is doable, and easily doable too, since we can select the candidate based on the running sum of 1s (+1) and 0s (-1), this has at most size N, and we can store the first position we reach each sum in 2N cells - see pmod's answer below (yellowfog's comments and geometric insight too).

Failing to spot this trick, I had replaced a fast but wrong with a slow but sure algorithm, (since correct algorithms are preferable to wrong ones!):

  • Build an array A with the accumulated number of 1s from the start to that position, e.g. if the bitstring is "001001001", then the array would be [0, 0, 1, 1, 1, 2, 2, 2, 3]. Using this, we can test in O(1) whether the subsequence (i,j), inclusive, is valid: isValid(i, j) = (j - i + 1 == 2 * (A[j] - A[i - 1]), i.e. it is valid if its length is double the amount of 1s in it. For example, the subsequence (3,6) is valid because 6 - 3 + 1 == 2 * A[6] - A[2] = 4.
  • Plain old double loop:

    maxSubsLength = 0 for i = 1 to N - 1 for j = i + 1 to N if isValid(i, j) ... #maintain maxSubsLength end end

This can be sped up a bit using some branch-and-bound by skipping i/j sequences which are shorter than the current maxSubsLength, but asymptotically this is still O(n^2). Slow, but with a big plus on its side: correct!

like image 100
Dimitris Andreou Avatar answered Oct 21 '22 00:10

Dimitris Andreou