Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partitioning a String of Brackets

Tags:

algorithm

Could anyone help me with this problem? I'll type out the problem, then give some of my thoughts/alternative solutions.

So the problem pretty much is, given a single string of brackets like this:

[[]]

We want to assign each bracket a group number (group 1 or group 2). A valid assignment means that if you ONLY look at brackets in group one, it forms a valid, balanced bracket string (which is pretty much stuff like [ ] [ [ ] ] and stuff NOT like ]]]][ ]. The same has to be true of group two. Groups don't have to be contiguous. We want to count the ways to split these brackets into 2 groups.

On the sample string above [ [ ] ], the answer would be six, here are the enumerations: (1 = group 1, 2 = group 2)

  [[]]
1.1111
2.2222
3.1221
4.2112
5.1212
6.2121

The arrangement doesn't have to include the all groups (like in arrangements 1. and 2.)

Thoughts

An obvious brute force solution that works with up to 32 brackets, rather quickly, is to have a 32 bit integer representing which brackets are part of a single group. Or we could use an array. Runtime is O(2^N) (I think), which is too slow?

From looking at the problem, I think that the original string of brackets you are given has to be pre-balanced, or else there is no way to pick a subset such that group 1 and 2 are balanced.

I also noticed that you can separate components - the string "[]" has 2 arrangements, so the string "[][]" has 4 arrangements. (You can find the the number of ways in each component and multiply them together).

I'm confused on how to get these ideas into an algorithm though. I wrote the brute force program and I checked the strings "[]", "[[]]", "[[[]]]", and "[[[[]]]]", and I don't really see a pattern.

From plugging these strings into my brute force program, I get:

"[]" = 2
"[[]]" = 6
"[[]]" = 20
"[[[[]]]]" = 70

Code:

char buf[1000];
int N;
bool isValid(int mask)
{
    int lv = 0;
    for (int i = 0; i < N; i++)
    {
        if (mask & (1 << i))
        {
            if (buf[i] == '(')
            {
                lv++;
            }
            else
            {
                lv--;
            }
            if (lv<0)
            {
                return false;
            }
        }
    }
    return lv==0;
}

int main() 
{
    scanf("%s", buf);
    N = strlen(buf);
    int ways = 0;
    for (int i = 0; i < (1 << N); i++)
    {
        if (isValid(i) && isValid(~i))
        {
            ways++;
        }
    }
    printf("Number of ways is %d\n", ways);
    return 0;
}
like image 487
dave Avatar asked Nov 18 '12 02:11

dave


1 Answers

I give an O(n^3)-time, O(n^2)-space dynamic programming solution in C++ below. But the justification for this algorithm requires some explanation first. In the following, I use "substring" to mean an ordered subset of elements that must be contiguous, and "subsequence" to mean an ordered subset that need not be.

Generating strings that we know are valid

Define the depth of a string to be number of [s it contains minus the number of ]s.

Let's establish some rules that all valid ("balanced") bracket-strings must obey:

  1. There must be an equal number of [s and ]s.
  2. No prefix of the string may have negative depth, i.e. more ]s than [s.

These are plainly necessary conditions -- if a string violates either rule, it cannot be valid. But in order to be able to conveniently generate strings that we know are valid, we need to show that these conditions are also sufficient: that any string that obeys these rules must be valid. To help with this, let's introduce a lemma:

Lemma: If a nonempty string obeys conditions (1) and (2), then it must contain [] as a substring.

Proof: It must begin with a [, since otherwise the length-1 prefix will contain more ]s than [s and violate (2). Therefore it must contain at least one ], since otherwise there will be i >= 1 [s and 0 ]s, and a valid string must contain an equal number of each by (1). Therefore there must be a first occurrence of a ] at some position j > 1, and the character to its left must be a [.

Suppose we have a nonempty string x that obeys conditions (1) and (2). By the lemma, it must contain a []. Deleting this pair cannot cause either of these conditions to be violated, so the resulting string, if nonempty, must still obey conditions (1) and (2) and so must still contain a [] somewhere. Thus we can continue deleting []s until we are left with the empty string.

Inserting a [] into a valid string at any position must produce a new valid string, because the new bracket pair always match each other and don't disturb any other matched pairs. Observe that it is possible to build up our original string x by repeatedly inserting []s into the empty string (which is trivially valid) in the reverse order that we deleted them in the previous paragraph: so we have now proved that x (i.e. any string that obeys conditions (1) and (2)) is valid.

The right recursion

An equivalent way to phrase the OP's question is: "How many ways can we select a valid subsequence of character positions so that the remaining subsequence is also valid?" It's possible to solve this problem using recursion if we first generalise it to:

Given that our selected subsequence so far has depth d, and our unselected subsequence so far has depth e, how many ways can we select a valid subsequence from the suffix beginning at position k so that the remaining subsequence is also valid?

Call this function count(d, e, k). The answer to the original question is now count(0, 0, 0).

In fact we can simplify the problem further by noticing that d+e must equal the total depth after k characters, so we can determine e from d and k, and count() need only have 2 parameters.

Also, when testing whether it's possible to select an empty subsequence, we only need to test that d == 0. We don't need to bother testing that e plus the remaining suffix gets down to 0 without going below it, since if d == 0 then we have subtracted out a net depth of 0 from the original string (which must end with a depth of 0, and not go below 0, assuming it is valid).

To solve this recursively, we need to break off a first decision point from the process of searching through all possible subsequences. A subsequence of a length-n string must fall into one of the following n+1 cases: either it is empty, or it has a leftmost element, which could be any of the n characters in the string. The subsequences produced by recursing after making this first decision will all be distinct.

With the recursion working properly, memoising it is straightforward: just record the correct answer for any given call in the 2D vector memo[][], which is initially filled with -1 values. Since the function count(d, k) has 2 parameters that can range from 0 up to n/2 and from 0 up to n respectively for a length-n string, O(n^2) space is needed, and the inside of the if (memo[d][k] == -1) { block will execute at most O(n^2) times. Each time that happens, an O(n) loop runs, taking the time complexity up to O(n^3).

The actual code

#include <iostream>
#include <vector>
#include <string>

using namespace std;

class PartitionCounter {
    // Return the number of subsequences of the suffix of v beginning at position k
    // that are (a) valid, given that the initial depth of the subsequence is d (on
    // account of it being the suffix of some larger subsequence), and (b)
    // leave behind a remainder subsequence that is also valid, given that
    // the remainder sequence has initial depth depths[k]-d.
    int count(int d, int k) {
        // If a prefix of either sequence (selected or remaining) has more ']'s
        // than '['s then there can't be any completing subsequences.
        if (d < 0 || depths[k] - d < 0) {
            return 0;
        }

        // Only compute the answer if we haven't already.
        if (memo[d][k] == -1) {
            // A subsequence must either contain no elements, or a leftmost element
            // at some position.  All subsequences produced by recursion after this
            // initial choice are distinct (when considering the sequence of
            // character indices included, though not necessarily when considering
            // the sequence of characters themselves).

            // Try including no elements.  This effectively terminates the larger
            // subsequence that the selected subsequence is part of, so it can be
            // legal only if its depth is 0.  It also effectively includes all
            // remaining characters in the remainder sequence, but if the selected
            // subsequence has depth 0 and the original string does too, then it's
            // implied that the remainder must also have total depth 0, so we don't
            // need to check it.
            int n = (d == 0);

            // Try including a leftmost element at each remaining position.
            // If this would cause a remainder subsequence that has negative
            // depth, stop: any later loop iterations would also create illegal
            // remainder subsequences.
            for (int i = k; i < v.size() && depths[i] - d >= 0; ++i) {
                n += count(d + v[i], i + 1);
            }

            memo[d][k] = n;
        }

        return memo[d][k];
    }

    vector<int> v;          // 1 for '[', -1 for ']'
    vector<int> depths;     // depths[i] is the sum of the 1st i elements
    vector<vector<int> > memo;  // DP matrix.  -1 => not computed yet

public:
    PartitionCounter(string s) : memo(s.size() / 2 + 1, vector<int>(s.size() + 1, -1)) {
        depths.push_back(0);
        int total = 0;
        for (int i = 0; i < s.size(); ++i) {
            v.push_back(1 - 2 * (s[i] == ']')); // Map '[' to 1 and ']' to -1
            depths.push_back(total += v[i]);
        }
    }

    int count() {
        if (depths.back() == 0) {
            return count(0, 0);
        } else {
            return 0;       // Need to handle invalid strings specially
        }
    }
};

int main(int argc, char **argv) {
    PartitionCounter c(argv[1]);
    cout << c.count() << '\n';
}

Results

C:\>partitioncounter []
2

C:\>partitioncounter [[]]
6

C:\>partitioncounter [[[]]]
20

C:\>partitioncounter [[[[]]]]
70

C:\>stopwatch partitioncounter [][[[[[][][][][[][][]]]]]][]
10001208
stopwatch: Terminated. Elapsed time: 15ms
stopwatch: Process completed with exit code 0.

C:\>stopwatch partitioncounter [][[[[[][[[]][][][[]]][[][]]]]]][]
562547776
stopwatch: Terminated. Elapsed time: 0ms
stopwatch: Process completed with exit code 0.

You can of course use long long or whatever instead of int if you need the extra bits.

EDIT: Fixed bug pointed out by ishi. As we skip over characters to exclude from the selected subsequence, the remainder subsequence accumulates them. What was happening was that we were effectively only excluding remainder subsequences that had more ]s than [s on the entire subsequence so far -- but in order to avoid violating condition (2), we need to check that this is true for all prefixes of the string. We now do this by stopping the loop early, so that these violating remainder subsequences are never generated in the first place. As a bonus, the algorithm gets faster! :)

like image 199
j_random_hacker Avatar answered Oct 07 '22 04:10

j_random_hacker