Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find most common pair of characters in a string

i have written the following function

//O(n^2)
void MostCommonPair(char * cArr , char * ch1 , char * ch2 , int * amount)
{
    int count , max = 0;
    char cCurrent , cCurrent2;
    int i = 0 , j;
    while(*(cArr + i + 1) != '\0')
    {
        cCurrent = *(cArr + i);
        cCurrent2 = *(cArr + i + 1);
        for(j = i , count = 0 ; *(cArr + j + 1) != '\0' ; j++)
        {
            if(cCurrent ==  *(cArr + j) && cCurrent2 ==  *(cArr + j + 1))
            {
                count++;
            }
        }
        if(count > max)
        {
            *ch1 = cCurrent;
            *ch2 = cCurrent2;
            max = *amount = count;
        }
        i++;
    }
}

for the following input

"xdshahaalohalobscxbsbsbs"

ch1 = b ch2 = s amount = 4

but in my opinion the function is very un efficient , is there a way to go through the string only once or to reduce the run size to O(n)?

like image 897
Daniel Jakobsen Hallel Avatar asked Dec 20 '12 20:12

Daniel Jakobsen Hallel


People also ask

How do you find the most common character in a string?

The simplest solution to find the most frequent character of the string is to count frequency of each character of the string and then print the character with highest frequency.

How do I find the most common characters in a string in python?

We find maximum occurring character by using max() on values.

How do I find the most common character in a string C++?

Create a function to calculate the maximum occurring character in the string. Create an array to keep the count of individual characters and initialize the array as 0. Construct character count array from the input string. Initialize max count and result.


1 Answers

Since char can hold up to 256 values, you can set up a two-dimensional table of [256*256] counters, run through your string once, incrementing the counter that corresponds to each pair of character in the string. Then you can go through the table of 256x256 numbers, pick the largest count, and know to what pair it belongs by looking at its position in the 2D array. Since the size of the counter table is fixed to a constant value independent of the length of the string, that operation is O(1), even though it requires two nested loops.

int count[256][256];
memset(count, 0, sizeof(count));
const char *str = "xdshahaalohalobscxbsbsbs";
for (const char *p = str ; *(p+1) ; p++) {
    count[(int)*p][(int)*(p+1)]++;
}
int bestA = 0, bestB = 0;
for (int i = 0 ; i != 256 ; i++) {
    for (int j = 0 ; j != 256 ; j++) {
        if (count[i][j] > count[bestA][bestB]) {
            bestA = i;
            bestB = j;
        }
    }
}
printf("'%c%c' : %d times\n", bestA, bestB, count[bestA][bestB]);

Here is a link to a demo on ideone.

Keep in mind that although this is the fastest possible solution asymptotically (i.e. it's O(N), and you cannot make it faster than O(N)) the performance is not going to be good for shorter strings. In fact, your solution will beat it hands-down on inputs shorter than approximately 256 characters, probably even more. There is a number of optimizations that you can apply to this code, but I decided against adding them on to keep the main idea of the code clearly visible in its purest and simplest form.

like image 60
Sergey Kalinichenko Avatar answered Oct 05 '22 14:10

Sergey Kalinichenko