Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find longest positive substrings in binary string

Let's assume I have a string like 100110001010001. I'd like to find such substring that:

  • are as longest as possible
  • have total positive sum >0

So the longest substrings, that have more 1s than 0s.

For example for the string above 100110001010001 it would be: [10011]000[101]000[1]

Actually it's be satisfying to find the total length of those, in this case: 9.

Unfortunately I have no clue, how can it be done not in brute-force way. Any ideas, please?

like image 819
mtszkw Avatar asked Oct 30 '22 19:10

mtszkw


1 Answers

As posted now, your question seems a bit unclear. The total length of valid substrings that are "as long as possible" could mean different things: for example, among other options, it could be (1) a list of the longest valid extension to the left of each index (which would allow overlaps in the list), (2) the longest combination of non-overlapping such longest left-extensions, (3) the longest combination of non-overlapping, valid substrings (where each substring is not necessarily the longest possible).

I will outline a method for (3) since it easily transforms to (1) or (2). Finding the longest left-extension from each index with more ones than zeros can be done in O(n log n) time and O(n) additional space (for just the longest valid substring in O(n) time, see here: Finding the longest non-negative sub array). With that preprocessing, finding the longest combination of valid, non-overlapping substrings can be done with dynamic programming in somewhat optimized O(n^2) time and O(n) additional space.

We start by traversing the string, storing sums representing the partial sum up to and including s[i], counting zeros as -1. We insert each partial sum in a binary tree where each node also stores an array of indexes where the value occurs, and the leftmost index of a value less than the node's value. (A substring from s[a] to s[b] has more ones than zeros if the prefix sum up to b is greater than the prefix sum up to a.) If a value is already in the tree, we add the index to the node's index array.

Since we are traversing from left to right, only when a new lowest value is inserted into the tree is the leftmost-index-of-lower-value updated — and it's updated only for the node with the previous lowest value. This is because any nodes with a lower value would not need updating; and if any nodes with lower values were already in the tree, any nodes with higher values would already have stored the index of the earliest one inserted.

The longest valid substring to the left of each index extends to the leftmost index with a lower prefix sum, which can be easily looked up in the tree.

To get the longest combination, let f(i) represent the longest combination up to index i. Then f(i) equals the maximum of the length of each valid left extension possible to index j added to f(j-1).

like image 74
גלעד ברקן Avatar answered Nov 15 '22 08:11

גלעד ברקן