Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find longest substring in binary string with not less ones than zeros

How to find, in a binary string, the longest substring where the balance, i.e. the difference between the number of ones and zeros, is >= 0?

Example:

01110000010 -> 6: 011100

1110000011110000111 -> 19: entire string

While this problem looks very similar to the Maximum Value Contiguous Subsequence (Maximum Contiguous Sum) problem, a dynamic programming solution doesn't seem to be obvious. In a divide-and-conquer approach, how to do the merging? Is an "efficient" algorithm possible after all? (A trivial O(n^2) algorithm will just iterate over all substrings for all possible starting points.)

This is a modified variant of Finding a substring, with some additional conditions. The difference is that in the linked question, only such substrings are allowed where balance never falls below zero (looking at the string in either forward or backward direction). In the given problem, balance is allowed to fall below zero, provided it recovers at some later stage.

like image 485
krlmlr Avatar asked Nov 19 '25 06:11

krlmlr


2 Answers

I have a solution that requires O(n) additional memory and O(n) time.

Let's denote the 'height' of an index h(i) as

h(i) = <number of 1s in the substring 1..i> - <number of 0s in the same substring>

The problem can now be reformulated as: find i and j such as h(i) <= h(j) and j-i -> max.

Obviously, h(0) = 0, and if h(n) = 0, then the solution is the entire string.

Now let's compute the array B so that B[x] = min{i: h(i) = -x}. In other words, let B[x] be the leftmost index i at which h(i)= -x.

The array B[x] has a length of at most n, and is computed in one linear pass.

Now we can iterate over the original string and for each index i compute the length of the longest sequence with non-negative balance that ends on i as follows:

Lmax(i) = i - B[MIN{0, h(i)}]

The largest Lmax(i) across all i will give you the desired length.

I leave the proof as an exercise :) Contact me if you can't figure it out.

Also, my algorithm needs 2 passes of the original string, but you can collapse them into one.

like image 166
Zruty Avatar answered Nov 21 '25 08:11

Zruty


This can be answered quite easily in O(n) using "height array", representing the number of 1's relative to the number of 0's. Like my answer in the linked question.

Now, instead of focusing on the original array, we now focus on two arrays indexed by the heights, and one will contain the smallest index such height is found, and the other will contain the largest index such height is found. Since we don't want a negative index, we can shift everything up, such that the minimum height is 0.

So for the sample cases (I added two more 1's at the end to show my point):

1110000011010000011111
Array height visualization
  /\
 /  \
/    \
      \  /\/\        /
       \/    \      /
              \    /
               \  /
                \/
(lowest height = -5)
Shifted height array:
[5, 6, 7, 8, 7, 6, 5, 4, 3, 4, 5, 4, 5, 4, 3, 2, 1, 0, 1, 2, 3]
     Height:   0  1  2  3  4  5  6  7  8
first_view = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view  = [17,18,19,20,21,22, 5, 4, 3]

note that we have 22 numbers and 23 distinct indices, 0-22, representing the 23 spaces between and padding the numbers

We can build the first_view and last_view array in O(n).

Now, for each height in the first_view, we only need to check every larger heights in last_view, and take the index with maximum difference from the first_view index. For example, from height 0, the maximum value of index in larger heights is 22. So the longest substring starting at index 17+1 will end at index 22.

To find the maximum index on the last_view array, you can convert it to a maximum to the right in O(n):

last_view_max = [22,22,22,22,22,22, 5, 4, 3]

And so finding answer is simply subtracting first_view from last_view_max,

first_view    = [17,16,15, 8, 7, 0, 1, 2, 3]
last_view_max = [22,22,22,22,22,22, 5, 4, 3]
result        = [ 5, 6, 7,14,15,22, 4, 2, 0]

and taking the maximum (again in O(n)), which is 22, achieved from starting index 0 to ending index 22, i.e., the whole string. =D

Proof of correctness:

Suppose that the maximum substring starts at index i, ends at index j. If the height at index i is the same as the height at index k<i, then k..j would be a longer substring still satisfying the requirement. Therefore it suffices to consider the first index of each height. Analogously for the last index.

like image 33
justhalf Avatar answered Nov 21 '25 10:11

justhalf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!