Given a word and a text, return the count of the occurrences of anagrams of the word in the text. For eg. word is “for” and the text is “forxxorfxdofr”, anagrams of “for” will be “ofr”, “orf”,”fro”, etc. So the answer would be 3 for this particular example.
I have got the brute force approach which is getting all the permutations of the word, then compare if the text contains it, and increase the number of occurrences, but that is O(N^2) approach. I'm looking for a better complexity.
An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, the word anagram itself can be rearranged into nag a ram, also the word binary into brainy and the word adobe into abode.
A simple approach is to traverse from start of the string considering substrings of length equal to the length of the given word and then check if this substring has all the characters of word.
An anagram is a literary device where the letters that make up a word, phrase, or name are rearranged to create new ones. The original word or phrase is the subject of the anagram, the anagram is what is created by repurposing those letters.
You can simply look for the character count.
Say for example that you're looking for anagramms of look
. So, you're looking for:
Simply process the first 4 letters, store the counts. Check whether you have a match. Add the next character (increment), remove the old character (decrement). Check again. And so on...
TooTone's O(n) solution suffers from having to compare two 256-element vectors for each character of the input text. This can be avoided by tracking the number of positions at which the two vectors differ, and registering a match when this number goes to zero. In fact, we don't even need to store two different vectors at all, since we can just store one vector containing their difference.
Here's my version implementing these optimizations. It's written in plain old C, but should work under C++ with appropriate adjustments:
#include <stdio.h>
#include <limits.h> /* for UCHAR_MAX (usually 255) */
int find_anagrams (char *word, char *text) {
int len = 0; /* length of search word */
int bin[UCHAR_MAX+1]; /* excess count of each char in last len chars of text */
int mismatch = 0; /* count of nonzero values in bins[] */
int found = 0; /* number of anagrams found */
int i; /* generic loop counter */
/* initialize bins */
for (i = 0; i <= UCHAR_MAX; i++) bin[i] = 0;
for (i = 0; word[i] != '\0'; i++) {
unsigned char c = (unsigned char) word[i];
if (bin[c] == 0) mismatch++;
bin[c]--;
len++; /* who needs strlen()? */
}
/* iterate through text */
for (i = 0; text[i] != '\0'; i++) {
/* add next char in text to bins, keep track of mismatch count */
unsigned char c = (unsigned char) text[i];
if (bin[c] == 0) mismatch++;
if (bin[c] == -1) mismatch--;
bin[c]++;
/* remove len-th previous char from bins, keep track of mismatch count */
if (i >= len) {
unsigned char d = (unsigned char) text[i - len];
if (bin[d] == 0) mismatch++;
if (bin[d] == 1) mismatch--;
bin[d]--;
}
/* if mismatch count is zero, we've found an anagram */
if (mismatch == 0) {
found++;
#ifdef DEBUG
/* optional: print each anagram found */
printf("Anagram found at position %d: \"", i-len+1);
fwrite(text+i-len+1, 1, len, stdout);
printf("\"\n");
#endif
}
}
return found;
}
int main (int argc, char *argv[]) {
if (argc == 3) {
int n = find_anagrams(argv[1], argv[2]);
printf("Found %d anagrams of \"%s\" in \"%s\".\n", n, argv[1], argv[2]);
return 0;
} else {
fprintf(stderr, "Usage: %s <word> <text>\n", (argc ? argv[0] : "countanagrams"));
return 1;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With