Had the following as an interview question a while ago and choked so bad on basic syntax that I failed to advance (once the adrenalin kicks in, coding goes out the window.) Given a list of string, return a list of sets of strings that are anagrams of the input set. i.e. "dog","god", "foo" should return {"dog","god"}. Afterward, I created the code on my own as a sanity check and it's been around now for a bit. I'd welcome input on it to see if I missed anything or if I could have done it much more efficiently. Take it as a chance to improve myself and learn other techniques: <pre class="prettyprint"><code> void Anagram::doWork(list input, list> &output) { typedef list < pair < string, string>> SortType; SortType sortedInput; // sort each string and pair it with the original for(list< string >::iterator i = input.begin(); i != input.end(); ++i) { string tempString(*i); std::sort(tempString.begin(), tempString.end()); sortedInput.push_back(make_pair(*i, tempString)); } // Now step through the new sorted list for(SortType::iterator i = sortedInput.begin(); i != sortedInput.end();) { set< string > newSet; // Assume (hope) we have a match and pre-add the first. newSet.insert(i->first); // Set the secondary iterator one past the outside to prevent // matching the original SortType::iterator j = i; ++j; while(j != sortedInput.end()) { if(i->second == j->second) { // If the string matches, add it to the set and remove it // so that future searches need not worry about it newSet.insert(j->first); j = sortedInput.erase(j); } else { // else, next element ++j; } } // If size is bigger than our original push, we have a match // - save it to the output if(newSet.size() > 1) { output.push_back(newSet); } // erase this element and update the iterator i = sortedInput.erase(i); } } </code></pre> Here's a second pass at this after reviewing comments and learning a bit more: <pre class="prettyprint"><code> void doBetterWork(list input, list> &output) { typedef std::multimap< string, string > SortedInputType; SortedInputType sortedInput; vector< string > sortedNames; for(vector< string >::iterator i = input.begin(); i != input.end(); ++i) { string tempString(*i); std::sort(tempString.begin(), tempString.end()); sortedInput.insert(make_pair(tempString, *i)); sortedNames.push_back(tempString); } for(list< string >::iterator i = sortedNames.begin(); i != sortedNames.end(); ++i) { pair< SortedInputType::iterator,SortedInputType::iterator > bounds; bounds = sortedInput.equal_range(*i); set< string > newSet; for(SortedInputType::iterator j = bounds.first; j != bounds.second; ++j) { newSet.insert(j->second); } if(newSet.size() > 1) { output.push_back(newSet); } sortedInput.erase(bounds.first, bounds.second); } } </code></pre>

The best way to group anagrams is to map the strings to some sort of histogram representation. <pre class="prettyprint"><code> FUNCTION histogram [input] -> [output] "dog" -> (1xd, 1xg, 1xo) "god" -> (1xd, 1xg, 1xo) "foo" -> (1xf, 2xo) </code></pre> Basically, with a linear scan of a string, you can produce the histogram representation of how many of each letters it contains. A small, finite alphabet makes this even easier (e.g. with <code>A-Z</code>, you just have an array of 26 numbers, one for each letter). Now, anagrams are simply words that have the same histogram. Then you can have a multimap data structure that maps a histogram to a list of words that have that histogram. <pre class="prettyprint"><code>MULTIMAP [key] => [set of values] (1xd, 1xg, 1xo) => { "dog", "god" } (1xf, 2xo) => { "foo" } </code></pre> <h3>The canonical form trick</h3> Instead of working on the histograms, you can also work on the "canonical form" of the strings. Basically, you define for each string, what its canonical form is, and two words are anagrams if they have the same canonical form. One convenient canonical form is to have the letters in the string in sorted order. <pre class="prettyprint"><code>FUNCTION canonize [input] -> [output] "dog" -> "dgo" "god" -> "dgo" "abracadabra" -> "aaaaabbcdrr" </code></pre> Note that this is just one step after the histogram approach: you're essentially doing counting sort to sort the letters. This is the most practical solution in actual programming language to your problem. <h3>Complexity</h3> Producing the histogram/canonical form of a word is practically <code>O(1)</code> (finite alphabet size, finite maximum word length). With a good hash implementation, <code>get</code> and <code>put</code> on the multimap is <code>O(1)</code>. You can even have multiple multimaps, one for each word length. If there are <code>N</code> words, putting all the words into the multimaps is therefore <code>O(N)</code>; then outputting each anagram group is simply dumping the values in the multimaps. This too can be done in <code>O(N)</code>. This is certainly better than checking if each pair of word are anagrams (an <code>O(N^2)</code> algorithm).

Check my anagram code from a job interview in the past

Tags:

c++

anagram

Had the following as an interview question a while ago and choked so bad on basic syntax that I failed to advance (once the adrenalin kicks in, coding goes out the window.)

Given a list of string, return a list of sets of strings that are anagrams of the input set. i.e. "dog","god", "foo" should return {"dog","god"}. Afterward, I created the code on my own as a sanity check and it's been around now for a bit. I'd welcome input on it to see if I missed anything or if I could have done it much more efficiently. Take it as a chance to improve myself and learn other techniques:


void Anagram::doWork(list input, list> &output)
{
  typedef list < pair < string, string>> SortType;
  SortType sortedInput;

  // sort each string and pair it with the original
  for(list< string >::iterator i = input.begin(); i != input.end(); ++i)
  {
    string tempString(*i);
    std::sort(tempString.begin(), tempString.end());
    sortedInput.push_back(make_pair(*i, tempString));
  }

  // Now step through the new sorted list
  for(SortType::iterator i = sortedInput.begin(); i != sortedInput.end();)
  {
    set< string > newSet;

    // Assume (hope) we have a match and pre-add the first.
    newSet.insert(i->first);

    // Set the secondary iterator one past the outside to prevent
    // matching the original
    SortType::iterator j = i;
    ++j;

    while(j != sortedInput.end())
    {
      if(i->second == j->second)
      {
        // If the string matches, add it to the set and remove it
        // so that future searches need not worry about it
        newSet.insert(j->first);
        j = sortedInput.erase(j);
      }
      else
      {
        // else, next element
        ++j;
      }
    }

    // If size is bigger than our original push, we have a match 
    // - save it to the output
    if(newSet.size() > 1)
    {
       output.push_back(newSet);
    }

    // erase this element and update the iterator
    i = sortedInput.erase(i);
  }
}

Here's a second pass at this after reviewing comments and learning a bit more:


void doBetterWork(list input, list> &output)
{
  typedef std::multimap< string, string > SortedInputType;
  SortedInputType sortedInput;
  vector< string > sortedNames;

  for(vector< string >::iterator i = input.begin(); i != input.end(); ++i)
  {
    string tempString(*i);
    std::sort(tempString.begin(), tempString.end());
    sortedInput.insert(make_pair(tempString, *i));
    sortedNames.push_back(tempString);
  }

  for(list< string >::iterator i = sortedNames.begin(); i != sortedNames.end(); ++i)
  {
    pair< SortedInputType::iterator,SortedInputType::iterator > bounds;
    bounds = sortedInput.equal_range(*i);

    set< string > newSet;
    for(SortedInputType::iterator j = bounds.first; j != bounds.second; ++j)
    {
      newSet.insert(j->second);
    }

    if(newSet.size() > 1)
    {
      output.push_back(newSet);
    }

    sortedInput.erase(bounds.first, bounds.second);
  }
}

319

asked Apr 25 '10 03:04

Michael Dorgan

1 Answers

The best way to group anagrams is to map the strings to some sort of histogram representation.

 FUNCTION histogram
 [input] -> [output]

 "dog"   -> (1xd, 1xg, 1xo)
 "god"   -> (1xd, 1xg, 1xo)
 "foo"   -> (1xf, 2xo)

Basically, with a linear scan of a string, you can produce the histogram representation of how many of each letters it contains. A small, finite alphabet makes this even easier (e.g. with A-Z, you just have an array of 26 numbers, one for each letter).

Now, anagrams are simply words that have the same histogram.

Then you can have a multimap data structure that maps a histogram to a list of words that have that histogram.

MULTIMAP
[key]           => [set of values]

(1xd, 1xg, 1xo) => { "dog", "god" }
(1xf, 2xo)      => { "foo" }

The canonical form trick

Instead of working on the histograms, you can also work on the "canonical form" of the strings. Basically, you define for each string, what its canonical form is, and two words are anagrams if they have the same canonical form.

One convenient canonical form is to have the letters in the string in sorted order.

FUNCTION canonize
[input]  -> [output]

"dog"    -> "dgo"
"god"    -> "dgo"
"abracadabra" -> "aaaaabbcdrr"

Note that this is just one step after the histogram approach: you're essentially doing counting sort to sort the letters.

This is the most practical solution in actual programming language to your problem.

Complexity

Producing the histogram/canonical form of a word is practically O(1) (finite alphabet size, finite maximum word length).

With a good hash implementation, get and put on the multimap is O(1).

You can even have multiple multimaps, one for each word length.

If there are N words, putting all the words into the multimaps is therefore O(N); then outputting each anagram group is simply dumping the values in the multimaps. This too can be done in O(N).

This is certainly better than checking if each pair of word are anagrams (an O(N^2) algorithm).

160

answered Sep 30 '22 13:09

polygenelubricants

Related questions
                            
                                C++ IDE for Solaris SPARC
                            
                                Using #include to load OpenCL code
                            
                                String comparisons. How can you compare string with std::wstring? WRT strcmp
                            
                                C++: how to deal with const object that needs to be modified?
                            
                                Why does my C++ divide program not compile
                            
                                Choosing the right subclass to instantiate programmatically
                            
                                Using istream_iterator and reading from standard input or file
                            
                                Difference between a program that crashes and program that hangs
                            
                                Trusting the Return Value Optimization
                            
                                const cast to allow read lock, does this smell bad?
                            
                                C++ compilation for iPhone (STL issue?)
                            
                                What alternatives to the Windows registry exist to store software configuration settings [closed]
                            
                                'long long int' is interpreted as 'long int'. How do I get round this?
                            
                                C++ Variable declarable in function body, but not class member?
                            
                                Storing objects in STL vector - minimal set of methods
                            
                                Which programming languages support constant methods?
                            
                                Class declaration confusion - name between closing brace and semi-colon
                            
                                How do I compile for windows XP under windows 7 / visual studio 2008
                            
                                C++ Class Access Specifier Verbosity
                            
                                Convert char pointer (char*) to struct

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With