Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to generate anagrams

What would be the best strategy to generate anagrams.

An anagram is a type of word play, the result of rearranging the letters of a word or phrase to produce a new  word or phrase, using all the original letters exactly once;  ex. 
  • Eleven plus two is anagram of Twelve plus one
  • A decimal point is anagram of I'm a dot in place
  • Astronomers is anagram of Moon starers

At first it looks straightforwardly simple, just to jumble the letters and generate all possible combinations. But what would be the efficient approach to generate only the words in dictionary.

I came across this page, Solving anagrams in Ruby.

But what are your ideas?

like image 710
prakash Avatar asked Sep 10 '08 20:09

prakash


People also ask

Is there an anagram generator?

Anagram Solver is a tool used to help players rearrange letters to generate all the possible words from them. You input the letters, and Anagram Maker gives you the edge to win Scrabble, Words With Friends, or any other word game.


2 Answers

Most of these answers are horribly inefficient and/or will only give one-word solutions (no spaces). My solution will handle any number of words and is very efficient.

What you want is a trie data structure. Here's a complete Python implementation. You just need a word list saved in a file named words.txt You can try the Scrabble dictionary word list here:

http://www.isc.ro/lists/twl06.zip

MIN_WORD_SIZE = 4 # min size of a word in the output  class Node(object):     def __init__(self, letter='', final=False, depth=0):         self.letter = letter         self.final = final         self.depth = depth         self.children = {}     def add(self, letters):         node = self         for index, letter in enumerate(letters):             if letter not in node.children:                 node.children[letter] = Node(letter, index==len(letters)-1, index+1)             node = node.children[letter]     def anagram(self, letters):         tiles = {}         for letter in letters:             tiles[letter] = tiles.get(letter, 0) + 1         min_length = len(letters)         return self._anagram(tiles, [], self, min_length)     def _anagram(self, tiles, path, root, min_length):         if self.final and self.depth >= MIN_WORD_SIZE:             word = ''.join(path)             length = len(word.replace(' ', ''))             if length >= min_length:                 yield word             path.append(' ')             for word in root._anagram(tiles, path, root, min_length):                 yield word             path.pop()         for letter, node in self.children.iteritems():             count = tiles.get(letter, 0)             if count == 0:                 continue             tiles[letter] = count - 1             path.append(letter)             for word in node._anagram(tiles, path, root, min_length):                 yield word             path.pop()             tiles[letter] = count  def load_dictionary(path):     result = Node()     for line in open(path, 'r'):         word = line.strip().lower()         result.add(word)     return result  def main():     print 'Loading word list.'     words = load_dictionary('words.txt')     while True:         letters = raw_input('Enter letters: ')         letters = letters.lower()         letters = letters.replace(' ', '')         if not letters:             break         count = 0         for word in words.anagram(letters):             print word             count += 1         print '%d results.' % count  if __name__ == '__main__':     main() 

When you run the program, the words are loaded into a trie in memory. After that, just type in the letters you want to search with and it will print the results. It will only show results that use all of the input letters, nothing shorter.

It filters short words from the output, otherwise the number of results is huge. Feel free to tweak the MIN_WORD_SIZE setting. Keep in mind, just using "astronomers" as input gives 233,549 results if MIN_WORD_SIZE is 1. Perhaps you can find a shorter word list that only contains more common English words.

Also, the contraction "I'm" (from one of your examples) won't show up in the results unless you add "im" to the dictionary and set MIN_WORD_SIZE to 2.

The trick to getting multiple words is to jump back to the root node in the trie whenever you encounter a complete word in the search. Then you keep traversing the trie until all letters have been used.

like image 80
FogleBird Avatar answered Sep 30 '22 14:09

FogleBird


For each word in the dictionary, sort the letters alphabetically. So "foobar" becomes "abfoor."

Then when the input anagram comes in, sort its letters too, then look it up. It's as fast as a hashtable lookup!

For multiple words, you could do combinations of the sorted letters, sorting as you go. Still much faster than generating all combinations.

(see comments for more optimizations and details)

like image 43
Jason Cohen Avatar answered Sep 30 '22 13:09

Jason Cohen