Reduce binary string to an empty string by removing subsequences with alternative characters

Question

This was a question asked in the coding round for NASDAQ internship.

Program description:

The program takes a binary string as input. We have to successively remove sub-sequences having all characters alternating, till the string is empty. The task was to find the minimum number of steps required to do so.

Example1:
let the string be : 0111001
Removed-0101, Remaining-110
Removed-10 , Remaining-1
Removed-1
No of steps = 3

Example2:
let the string be : 111000111
Removed-101, Remaining-110011
Removed-101, Remaining-101
Removed-101
No of steps = 3

Example3:
let the string be : 11011
Removed-101, Remaining-11
Removed-1 , Remaining-1
Removed-1
No of steps = 3

Example4:
let the string be : 10101
Removed-10101
No of steps = 1

The solution I tried, considered the first character of the binary string as first character for my sub-sequence. Then created a new string, where the next character would be appended if it wasn't part of the alternating sequence. The new string becomes our binary string. In this way, a loop continues till the new string is empty. (somewhat an O(n^2) algorithm). As expected, it gave me a timeout error. Adding a somewhat similar code in C++ to the one I had tried, which was in Java.

    #include<bits/stdc++.h>
    using namespace std;
    
    int main() {
        string str, newStr;
        int len;
        char c;
        int count = 0;
        getline(cin, str);
        len = str.length();
    
        //continue till string is empty
        while(len > 0) {
            len = 0;
            c = str[0];
            for(int i=1; str[i] != '\0';i++) {
                //if alternative characters are found, set as c and avoid that character
                if(c != str[i]) 
                    c = str[i];
                //if next character is not alternate, add the character to newStr
                else {
                    newStr.push_back(str[i]);
                    len++;
                }
            }
            str = newStr;
            newStr = "";
            count++;
        }
        cout<<count<<endl;
        return 0;
    }

I also tried methods like finding the length of the largest sub sequence of same consecutive characters which obviously didn't satisfy every case, like that of example3.

Hope somebody could help me with the most optimized solution for this question. Preferably a code in C, C++ or python. Even the algorithm would do.

Serial Lazer · Accepted Answer

I found a more optimal O(NlogN) solution by maintaining a Min-Heap and Look-up hashMap.

We start with the initial array as alternating counts of 0, 1.

That is, for string= 0111001; lets assume our input-array S=[1,3,2,1]

Basic idea:

Heapify the count-array
Extract minimum count node => add to num_steps
Now extract both its neighbours (maintained in the Node-class) from the Heap using the lookup-map
Merge both these neighbours and insert into the Heap
Repeat steps 2-4 until no entries remain in the Heap

Code implementation in Python

class Node:
    def __init__(self, node_type: int, count: int):
        self.prev = None
        self.next = None
        self.node_type = node_type
        self.node_count = count

    @staticmethod
    def compare(node1, node2) -> bool:
        return node1.node_count < node2.node_count


def get_num_steps(S: list): ## Example: S = [2, 1, 2, 3]
    heap = []
    node_heap_position_map = {} ## Map[Node] -> Heap-index
    prev = None
    type = 0
    for s in S:
        node: Node = Node(type, s)
        node.prev = prev
        if prev is not None:
            prev.next = node
        prev = node
        type = 1 - type

        # Add element to the map and also maintain the updated positions of the elements for easy lookup
        addElementToHeap(heap, node_heap_position_map, node)

    num_steps = 0
    last_val = 0
    while len(heap) > 0:
        # Extract top-element and also update the positions in the lookup-map
        top_heap_val: Node = extractMinFromHeap(heap, node_heap_position_map)
        num_steps += top_heap_val.node_count - last_val
        last_val = top_heap_val.node_count

        # If its the corner element, no merging is required
        if top_heap_val.prev is None or top_heap_val.next is None:
            continue

        # Merge the nodes adjacent to the extracted-min-node:
        prev_node = top_heap_val.prev
        next_node = top_heap_val.next

        removeNodeFromHeap(prev_node, node_heap_position_map)
        removeNodeFromHeap(next_node, node_heap_position_map)
        del node_heap_position_map[prev_node]
        del node_heap_position_map[next_node]
        
        # Created the merged-node for neighbours and add to the Heap; and update the lookup-map
        merged_node = Node(prev_node.node_type, prev_node.node_count + next_node.node_count)
        merged_node.prev = prev_node.prev
        merged_node.next = next_node.next
        addElementToHeap(heap, node_heap_position_map, merged_node)

    return num_steps

PS: I havent implemented the Min-heap operations above, but the function-method-names are quite eponymous.

גלעד ברקן · Answer

We can solve this in O(n) time and O(1) space.

This isn't about order at all. The actual task, when you think about it, is how to divide the string into the least number of subsequences that consist of alternating characters (where a single is allowed). Just maintain two queues or stacks; one for 1s, the other for 0s, where characters pop their immediate alternate predecessors. Keep a record of how long the queue is at any one time during the iteration (not including the replacement moves).

Examples:

(1)

0111001

   queues
1  1   -
0  -   0
0  -   00
1  1   0
1  11  -
1  111 -  <- max 3
0  11  0

For O(1) space, The queues can just be two numbers representimg the current counts.

(2)

111000111
   queues (count of 1s and count of 0s)
1  1  0
1  2  0
1  3  0  <- max 3
0  2  1
0  1  2
0  0  3  <- max 3
1  1  2
1  2  1
1  3  0  <- max 3

(3)

11011
   queues
1  1  0
1  2  0
0  1  1
1  2  0
1  3  0  <- max 3

(4)

10101

   queues
1  1  0  <- max 1
0  0  1  <- max 1
1  1  0  <- max 1
0  0  1  <- max 1
1  1  0  <- max 1

moreON · Answer

I won't write the full code. But I have an idea of an approach that will probably be fast enough (certainly faster than building all of the intermediate strings).

Read the input and change it to a representation that consists of the lengths of sequences of the same character. So 11011 is represented with a structure that specifies it something like [{length: 2, value: 1}, {length: 1, value: 0}, {length: 2, value: 1}]. With some cleverness you can drop the values entirely and represent it as [2, 1, 2] - I'll leave that as an exercise for the reader.

With that representation you know that you can remove one value from each of the identified sequences of the same character in each "step". You can do this a number of times equal to the smallest length of any of those sequences.

So you identify the minimum sequence length, add that to a total number of operations that you're tracking, then subtract that from every sequence's length.

After doing that, you need to deal with sequences of 0 length. - Remove them, then if there are any adjacent sequences of the same value, merge those (add together the lengths, remove one). This merging step is the one that requires some care if you're going for the representation that forgets the values.

Keep repeating this until there's nothing left. It should run somewhat faster than dealing with string manipulations.

There's probably an even better approach that doesn't iterate through the steps at all after making this representation, just examining the lengths of sequences starting at the start in one pass through to the end. I haven't worked out what that approach is exactly, but I'm reasonably confident that it would exist. After trying what I've outlined above, working that out is a good idea. I have a feeling it's something like - start total at 0, keep track of minimum and maximum total reaches. Scan each value from the start of string, adding 1 to the total for each 1 encountered, subtracting 1 for each 0 encountered. The answer is the greater of the absolute values of the minimum and maximum reached by total. - I haven't verified that, it's just a hunch. Comments have lead to further speculation that doing this but adding together the maximum and absolute of minimum may be more realistic.

Reduce binary string to an empty string by removing subsequences with alternative characters

Tags:

algorithm

optimization

binary-string

Bichu Ben

3 Answers

Basic idea:

Code implementation in Python

Serial Lazer

גלעד ברקן

moreON

Recent Activity

Donate For Us

Reduce binary string to an empty string by removing subsequences with alternative characters

Tags:

algorithm

optimization

binary-string

Bichu Ben

3 Answers

Basic idea:

Code implementation in Python

Serial Lazer

גלעד ברקן

moreON

Related questions

Recent Activity

Donate For Us