Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing very often used anagram function

I have written a function that determines whether two words are anagrams. Word A is an anagram of word B if you can build word B out of A just by rearranging the letters, e.g.:

lead is anagram of deal

This is my function:

bool is_anagram(std::string const & s1, std::string const & s2)
{
    auto check = [](std::string const & x)
    {
        std::map<char, unsigned> counter;
        for(auto const & c : x)
        {
            auto it = counter.find(c);
            if(it == counter.end())
                counter[c] = 1;
            else
                ++counter[c];
        }
        return counter;
    };

    return check(s1) == check(s2);
}

This works fine, but as the number of words increases (and this function is used several million times within my application), it very soon became a major bottleneck of my application.

Does anyone have an idea of how to speed this function up?

like image 867
mel- Avatar asked Aug 08 '13 10:08

mel-


People also ask

How do you check if a word is an anagram?

Two words are anagrams of each other if they contain the same number of characters and the same characters. You should only need to sort the characters in lexicographic order, and determine if all the characters in one string are equal to and in the same order as all of the characters in the other string.


2 Answers

The map creation and your call to std::map::find within the iteration, is quite expensive.

In this case, you can use the fact that std::string behaves in many regards like a std::vector<char>, meaning that you can sort it using std::sort:

bool is_anagram(std::string s1, std::string s2) {     std::sort(s1.begin(), s1.end());     std::sort(s2.begin(), s2.end());     return s1 == s2; } 

Instead of the two maps that you are creating, I am taking a copy of the strings (passing them by value instead of const reference) and sorting them, so

sort("lead") => "adel" sort("deal") => "adel" 

This change should already speed your algorithm up by quite a bit. One more thing that may greatly affect the performance if you tend to compare arbitrary words:

bool is_anagram(std::string s1, std::string s2) {     if(s1.length() != s2.length())         return false;     /* as above */ } 

If the length of the two strings differs, it obviously cannot be an anagram. std::string::length() is a very fast operation (it needs to store the string size anyway), so we save us the hustle of O(N*log(N)) from sorting the two strings.

like image 68
nijansen Avatar answered Sep 28 '22 07:09

nijansen


You've got 26 characters, make one array of the size of 26, then iterate through both words and as you encouter characters in words, increment a corresponding array element for characters in the first word and decrement corresponding array element for charaacters in the second word. If the words are anagrams, all array elements will be 0 in the end. The complexity will be just O(N) where N is the length of words.

like image 40
Karadur Avatar answered Sep 28 '22 07:09

Karadur