Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reduce time complexity in traversing a string?

I was solving a problem to find number of such indexes, a, b, c, d in a string s, of size n made only of lowercase letters such that:

1 <= a < b < c < d <= n

and

s[a] == s[c] and s[b] == s[d]

The code I wrote traverses the string character by character in a basic manner:

#include<stdio.h>

int main()
{
    int n, count = 0;
    char s[2002];
    scanf("%d%s", &n, s);
    for(int a = 0; a<n-3; a++)
    {
        for(int b = a + 1; b<n-2; b++)
        {
            for(int c = b + 1; c<n-1; c++)
            {
                for(int d = c + 1; d<n; d++)
                {
                    if(s[a] == s[c] && s[b] == s[d] && a>=0 && b>a && c>b && d>c && d<n)
                    {
                        count++;
                    }
                }
            }
        }
    }
    printf("%d", count);
    return 0;
}

a, b, c and d are the indices. The trouble is that if the input string is big in size, the time limit is exceeded due to the 4 nested loops. Is there any way I can improve the code to decrease the complexity?

The problem statement is available here: https://www.hackerearth.com/practice/algorithms/searching/linear-search/practice-problems/algorithm/holiday-season-ab957deb/

like image 793
rohan843 Avatar asked Mar 02 '23 16:03

rohan843


1 Answers

The problem can be solved if you maintain an array which stores the cumulative frequency (the total of a frequency and all frequencies so far in a frequency distribution) of each character in the input string. Since the string will only consist of lower case characters, hence the array size will be [26][N+1].

For example:

index  - 1 2 3 4 5
string - a b a b a

cumulativeFrequency array:

    0  1  2  3  4  5
a   0  1  1  2  2  3
b   0  0  1  1  2  2

I have made the array by taking the index of first character of the input string as 1. Doing so will help us in solving the problem later. For now, just ignore column 0 and assume that the string starts from index 1 and not 0.


Useful facts

Using cumulative frequency array we can easily check if a character is present at any index i:

if cumulativeFrequency[i]-cumulativeFrequency[i-1] > 0

number of times a character is present from range i to j (excluding both i and j):

frequency between i and j =  cumulativeFrequency[j-1] - cumulativeFrequency[i]

Algorithm

1: for each character from a-z:
2:     Locate index a and c such that charAt[a] == charAt[c]
3:     for each pair (a, c):
4:         for character from a-z:
5:             b = frequency of character between a and c
6:             d = frequency of character after c
7:             count += b*d 

Time complexity

Line 1-2:

The outer most loop will run for 26 times. We need to locate all the pair(a, c), to do that we require a time complexity of O(n^2).

Line 3-4:

For each pair, we again run a loop 26 times to check how many times each character is present between a and c and after c.

Line 5-7:

Using cumulative frequency array, for each character we can easily calculate how many times it appears between a and c and after c in O(1).

Hence, overall complexity is O(26*n^2*26) = O(n^2).


Code

I code in Java. I do not have a code in C. I have used simple loops an array so it should be easy to understand.

//Input N and string 
//Do not pay attention to the next two lines since they are basically taking 
//input using Java input streams
int N = Integer.parseInt(bufferedReader.readLine().trim());
String str = bufferedReader.readLine().trim();

//Construct an array to store cumulative frequency of each character in the string
int[][] cumulativeFrequency = new int[26][N+1];

//Fill the cumulative frequency array
for (int i = 0;i < str.length();i++)
{
    //character an index i
    char ch = str.charAt(i);

    //Fill the cumulative frequency array for each character 
    for (int j = 0;j < 26;j++)
    {
        cumulativeFrequency[j][i+1] += cumulativeFrequency[j][i];
        if (ch-97 == j) cumulativeFrequency[j][i+1]++;
    }
}

int a, b, c, d;
long count = 0;

//Follow the steps of the algorithm here
for (int i = 0;i < 26;i++)
{
    for (int j = 1; j <= N - 2; j++)
    {
        //Check if character at i is present at index j
        a = cumulativeFrequency[i][j] - cumulativeFrequency[i][j - 1];

        if (a > 0)
        {
            //Check if character at i is present at index k
            for (int k = j + 2; k <= N; k++)
            {
                c = cumulativeFrequency[i][k] - cumulativeFrequency[i][k - 1];

                if (c > 0)
                {
                    //For each character, find b*d
                    for (int l = 0; l < 26; l++)
                    {
                        //For each character calculate b and d
                        b = cumulativeFrequency[l][k-1] - cumulativeFrequency[l][j];
                        d = cumulativeFrequency[l][N] - cumulativeFrequency[l][k];

                        count += b * d;
                        }
                    }
                }
            }
        }
    }

    System.out.println(count);

I hope I have helped you. The code I provided will not give time complexity error and it will work for all test cases. Do comment if you do not understand anything in my explanation.

like image 57
AKSingh Avatar answered Mar 04 '23 05:03

AKSingh