Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solve number of substrings having two unique characters in O(n)

Tags:

algorithm

I'm working on a series of substring problem:

Given a string:

  1. Find the substring containing only two unique characters that has maximum length.
  2. Find the number of all substrings containing AT MOST two unique characters.
  3. Find the number of all substrings containing two unique characters.

Seems like problem 1 and 2 has O(n) solution. However I cannot think of a O(n) solution for problem 3.(Here is the solution for problem 2 and here is for problem 1.).

So I would like to know does a O(n) solution for problem 3 exist or not?

Adding sample input/output for problem 3:

Given: abbac

Return: 6

Because there are 6 substring containing two unique chars: ab,abb,abba,bba,ba,ac

like image 583
Jun Avatar asked Nov 13 '22 01:11

Jun


1 Answers

Find the number of all substrings containing two unique characters.

Edit : I misread the question. This solution finds unique substrings with at least 2 unique characters

  1. The number of substrings for a given word whose length is len is given by len * (len + 1) / 2

    sum = len * (len + 1) / 2

    1. We are looking for substrings whose length is greater than 1. The above formula includes substrings which are of length 1. We need to substract those substrings.

So the total number of 2 letter substrings now is len * (len + 1) / 2 - l.

sum = `len * (len + 1) / 2 - l`
  1. Find the longest consecutive run of characters which are alike. Apply step 1 and 2. Subtract this current sum from the sum as obtained from step 2.

Sample implementation follows.

public static int allUniq2Substrings(char s[]) {
    int sum = s.length * (s.length + 1) / 2 - s.length;
    int sameRun = 0;
    for (int i = 0, prev = -1; i < s.length; prev = s[i++]) {
        if (s[i] != prev) {
            sum -= sameRun * (sameRun + 1) / 2 - sameRun;
            sameRun = 1;
        } else {
            sameRun++;
        }
    }

    return sum - (sameRun * (sameRun + 1) / 2 - sameRun);

}

allUniq2Substrings("aaac".toCharArray());
3

allUniq2Substrings("aabc".toCharArray());
5

allUniq2Substrings("aaa".toCharArray());
0

allUniq2Substrings("abcd".toCharArray());
6

Edit Let me try this again. I use the above 3 invariants. This is a subproblem of finding all substrings which contain at least 2 unique characters. I have a method posted above which gives me unique substrings for any length. I will use it to generate substrings from a set which contains at 2 unique characters.

We only need to keep track of the longest consequent run of characters whose set length is 2. ie Any permutation of 2 unique characters. The sum of such runs gives us the total number of desired substrings.

public static int allUniq2Substrings(char s[]) {
    int sum = s.length * (s.length + 1) / 2 - s.length;
    int sameRun = 0;
    for (int i = 0, prev = -1; i < s.length; prev = s[i++]) {
        if (s[i] != prev) {
            sum -= sameRun * (sameRun + 1) / 2 - sameRun;
            sameRun = 1;
        } else {
            sameRun++;
        }
    }

    return sum - (sameRun * (sameRun + 1) / 2 - sameRun);

}

public static int uniq2substring(char s[]) {
    int last = 0, secondLast = 0;
    int sum = 0;
    for (int i = 1; i < s.length; i++) {
        if (s[i] != s[i - 1]) {
            last = i;
            break;
        }
    }

    boolean OneTwo = false;
    int oneTwoIdx = -1; //alternating pattern

    for (int i = last + 1; i < s.length; ++i) {
        if (s[secondLast] != s[i] && s[last] != s[i]) { //detected more than 2 uniq chars
            sum += allUniq2Substrings(Arrays.copyOfRange(s, secondLast, i));
            secondLast = last;
            last = i;
            if (OneTwo) {
                secondLast = oneTwoIdx;
            }
            OneTwo = false;
        } else if (s[i] != last) { //alternating pattern detected a*b*a
            OneTwo = true;
            oneTwoIdx = i;
        }

    }

    return sum + allUniq2Substrings(Arrays.copyOfRange(s, secondLast, s.length));
}

uniq2substring("abaac".toCharArray())
6


uniq2substring("aab".toCharArray())
2

uniq2substring("aabb".toCharArray())
4

uniq2substring("ab".toCharArray())
1
like image 199
bsd Avatar answered Nov 15 '22 07:11

bsd