Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a word and a text, we need to return the occurrences of anagrams

Tags:

c++

algorithm

Given a word and a text, return the count of the occurrences of anagrams of the word in the text. For eg. word is “for” and the text is “forxxorfxdofr”, anagrams of “for” will be “ofr”, “orf”,”fro”, etc. So the answer would be 3 for this particular example.

I have got the brute force approach which is getting all the permutations of the word, then compare if the text contains it, and increase the number of occurrences, but that is O(N^2) approach. I'm looking for a better complexity.

like image 288
Ahmed Saleh Avatar asked Sep 15 '13 10:09

Ahmed Saleh


People also ask

What are anagrams example?

An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, the word anagram itself can be rearranged into nag a ram, also the word binary into brainy and the word adobe into abode.

How do you count the number of anagrams in a string?

A simple approach is to traverse from start of the string considering substrings of length equal to the length of the given word and then check if this substring has all the characters of word.

What makes a word an anagram?

An anagram is a literary device where the letters that make up a word, phrase, or name are rearranged to create new ones. The original word or phrase is the subject of the anagram, the anagram is what is created by repurposing those letters.


2 Answers

You can simply look for the character count.

Say for example that you're looking for anagramms of look. So, you're looking for:

  • a 4 charachter length word,
  • with 1 l, 2 o and 1 k.

Simply process the first 4 letters, store the counts. Check whether you have a match. Add the next character (increment), remove the old character (decrement). Check again. And so on...

like image 153
Karoly Horvath Avatar answered Oct 23 '22 08:10

Karoly Horvath


TooTone's O(n) solution suffers from having to compare two 256-element vectors for each character of the input text. This can be avoided by tracking the number of positions at which the two vectors differ, and registering a match when this number goes to zero. In fact, we don't even need to store two different vectors at all, since we can just store one vector containing their difference.

Here's my version implementing these optimizations. It's written in plain old C, but should work under C++ with appropriate adjustments:

#include <stdio.h>
#include <limits.h> /* for UCHAR_MAX (usually 255) */

int find_anagrams (char *word, char *text) {
    int len = 0;           /* length of search word */
    int bin[UCHAR_MAX+1];  /* excess count of each char in last len chars of text */
    int mismatch = 0;      /* count of nonzero values in bins[] */
    int found = 0;         /* number of anagrams found */
    int i;                 /* generic loop counter */

    /* initialize bins */
    for (i = 0; i <= UCHAR_MAX; i++) bin[i] = 0;
    for (i = 0; word[i] != '\0'; i++) {
        unsigned char c = (unsigned char) word[i];
        if (bin[c] == 0) mismatch++;
        bin[c]--;
        len++;  /* who needs strlen()? */
    }

    /* iterate through text */
    for (i = 0; text[i] != '\0'; i++) {
        /* add next char in text to bins, keep track of mismatch count */
        unsigned char c = (unsigned char) text[i];
        if (bin[c] == 0) mismatch++;
        if (bin[c] == -1) mismatch--;
        bin[c]++;

        /* remove len-th previous char from bins, keep track of mismatch count */
        if (i >= len) {
            unsigned char d = (unsigned char) text[i - len];
            if (bin[d] == 0) mismatch++;
            if (bin[d] == 1) mismatch--;
            bin[d]--;
        }

        /* if mismatch count is zero, we've found an anagram */
        if (mismatch == 0) {
            found++;
#ifdef DEBUG
            /* optional: print each anagram found */
            printf("Anagram found at position %d: \"", i-len+1);
            fwrite(text+i-len+1, 1, len, stdout);
            printf("\"\n");
#endif
        }
    }
    return found;
}


int main (int argc, char *argv[]) {
    if (argc == 3) {
        int n = find_anagrams(argv[1], argv[2]);
        printf("Found %d anagrams of \"%s\" in \"%s\".\n", n, argv[1], argv[2]);
        return 0;
    } else {
        fprintf(stderr, "Usage: %s <word> <text>\n", (argc ? argv[0] : "countanagrams"));
        return 1;
    }
}
like image 31
Ilmari Karonen Avatar answered Oct 23 '22 08:10

Ilmari Karonen