Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to calculate all the substrings of a string and check it for a given condition

Tags:

java

string

What is the fastest possible way to calculate all the possible substrings of a given string and check them for the following condition.

The condition is: If the first and the last Character of the generated substring is same then count is incremented by one. We need to find all such possible substrings of a given very large string.

I have tried the naive brute force approach but it did not work for strings with lengths 10^7. Please help :(

for(int c = 0 ; c < length ; c++ )
        {
            for( i = 3 ; i <= length - c ; i++ )
            {
                String sub = str.substring(c, c+i);
                System.out.println(sub);
                if(sub.charAt(0) == sub.charAt(sub.length()-1)){
                    count++;
                }
            }
        }
like image 398
bane19 Avatar asked Jun 20 '15 22:06

bane19


1 Answers

Your current solution is quadratic for the size of the input string or O(n^2)

You can solve this more efficiently by counting the occurrence of each character in the string, and then counting the number of substrings that can be created with this character.

E.g. if a character occurs 4 times, then this leads to 3 + 2 + 1 = 6 substrings.

You can use the following formula for this: ((n-1) * n) / 2

This brings the complexity of the algorithm down to O(n), because for counting each character you only need to traverse the String once.

I believe this code should work:

public static void main(String[] args) {
    String str = "xyzxyzxyzxyz";
    Map<Character, Integer> map = new HashMap<>();
    for (char c : str.toCharArray())
    {
        Integer count = map.get(c);
        if (count == null)
            count = 0;
        map.put(c, count + 1);
    }
    int sum = 0;
    for (int n : map.values())
        sum += ((n - 1) * n) / 2;
    System.out.println(sum);
}
like image 51
wvdz Avatar answered Oct 05 '22 22:10

wvdz