Given a word and a text, return the count of the occurrences of anagrams of the word in the text. For eg. word is “for” and the text is “forxxorfxdofr”, anagrams of “for” will be “ofr”, “orf”,”fro”, etc. So the answer would be 3 for this particular example. I have got the brute force approach which is getting all the permutations of the word, then compare if the text contains it, and increase the number of occurrences, but that is O(N^2) approach. I'm looking for a better complexity.

You can simply look for the character count. Say for example that you're looking for anagramms of <code>look</code>. So, you're looking for: <ul> <li>a 4 charachter length word,</li> <li>with 1 l, 2 o and 1 k.</li> </ul> Simply process the first 4 letters, store the counts. Check whether you have a match. Add the next character (increment), remove the old character (decrement). Check again. And so on...

Given a word and a text, we need to return the occurrences of anagrams

Tags:

c++

algorithm

Given a word and a text, return the count of the occurrences of anagrams of the word in the text. For eg. word is “for” and the text is “forxxorfxdofr”, anagrams of “for” will be “ofr”, “orf”,”fro”, etc. So the answer would be 3 for this particular example.

I have got the brute force approach which is getting all the permutations of the word, then compare if the text contains it, and increase the number of occurrences, but that is O(N^2) approach. I'm looking for a better complexity.

288

asked Sep 15 '13 10:09

Ahmed Saleh

2 Answers

You can simply look for the character count.

Say for example that you're looking for anagramms of look. So, you're looking for:

a 4 charachter length word,
with 1 l, 2 o and 1 k.

Simply process the first 4 letters, store the counts. Check whether you have a match. Add the next character (increment), remove the old character (decrement). Check again. And so on...

153

answered Oct 23 '22 08:10

Karoly Horvath

TooTone's O(n) solution suffers from having to compare two 256-element vectors for each character of the input text. This can be avoided by tracking the number of positions at which the two vectors differ, and registering a match when this number goes to zero. In fact, we don't even need to store two different vectors at all, since we can just store one vector containing their difference.

Here's my version implementing these optimizations. It's written in plain old C, but should work under C++ with appropriate adjustments:

#include <stdio.h>
#include <limits.h> /* for UCHAR_MAX (usually 255) */

int find_anagrams (char *word, char *text) {
    int len = 0;           /* length of search word */
    int bin[UCHAR_MAX+1];  /* excess count of each char in last len chars of text */
    int mismatch = 0;      /* count of nonzero values in bins[] */
    int found = 0;         /* number of anagrams found */
    int i;                 /* generic loop counter */

    /* initialize bins */
    for (i = 0; i <= UCHAR_MAX; i++) bin[i] = 0;
    for (i = 0; word[i] != '\0'; i++) {
        unsigned char c = (unsigned char) word[i];
        if (bin[c] == 0) mismatch++;
        bin[c]--;
        len++;  /* who needs strlen()? */
    }

    /* iterate through text */
    for (i = 0; text[i] != '\0'; i++) {
        /* add next char in text to bins, keep track of mismatch count */
        unsigned char c = (unsigned char) text[i];
        if (bin[c] == 0) mismatch++;
        if (bin[c] == -1) mismatch--;
        bin[c]++;

        /* remove len-th previous char from bins, keep track of mismatch count */
        if (i >= len) {
            unsigned char d = (unsigned char) text[i - len];
            if (bin[d] == 0) mismatch++;
            if (bin[d] == 1) mismatch--;
            bin[d]--;
        }

        /* if mismatch count is zero, we've found an anagram */
        if (mismatch == 0) {
            found++;
#ifdef DEBUG
            /* optional: print each anagram found */
            printf("Anagram found at position %d: \"", i-len+1);
            fwrite(text+i-len+1, 1, len, stdout);
            printf("\"\n");
#endif
        }
    }
    return found;
}


int main (int argc, char *argv[]) {
    if (argc == 3) {
        int n = find_anagrams(argv[1], argv[2]);
        printf("Found %d anagrams of \"%s\" in \"%s\".\n", n, argv[1], argv[2]);
        return 0;
    } else {
        fprintf(stderr, "Usage: %s <word> <text>\n", (argc ? argv[0] : "countanagrams"));
        return 1;
    }
}

answered Oct 23 '22 08:10

Ilmari Karonen

Related questions
                            
                                How does the compiler know to move local variables?
                            
                                Fast gradient-descent implementation in a C++ library?
                            
                                c++: can vector<Base> contain objects of type Derived?
                            
                                How to print 2D Arrays in C++?
                            
                                C++ object name same as class name
                            
                                Do I have to use weak_ptr.lock() just to test if it points to a valid object?
                            
                                Parallel Merge-Sort in OpenMP
                            
                                Inheritance from empty base class in C++
                            
                                mktime returns -1 when given a valid struct tm
                            
                                Difference on std::vector push_back(Object()) and push_back(new Object())?
                            
                                Do I have to typedef on every Header file?
                            
                                List<string> in C++
                            
                                Qt creator Adding external library (still: Cannot open include file: 'GL/glew.h')
                            
                                Waiting for an atomic_bool
                            
                                Implementing Undo - Redo Functionality in Qt?
                            
                                More Precise Floating point Data Types than double?
                            
                                Why is deque using so much more RAM than vector in C++?
                            
                                How to deep copy QMap and other Qt containers
                            
                                C++ classes use of double colon [duplicate]
                            
                                Is use of std::forward with auto&& the right thing to do

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With