Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

All anagrams in a File

Source : Microsoft Interview Question

We are given a File containing words.We need to determine all the Anagrams Present in it .

Can someone suggest most optimal algorithm to do this.

Only way i know is Sorting all the words,then checking .

like image 843
Spandan Avatar asked Jun 01 '13 12:06

Spandan


2 Answers

It would be good to know more about data before suggesting an algorithm, but lets just assume that the words are in English in the single case.

Lets assign each letter a prime number from 2 to 101. For each word we can count it's "anagram number" by multiplying its letter corresponding numbers.

Lets declare a dictionary of {number, list} pairs. And one list to collect resulting anagrams into.

Then we can collect anagrams in two steps: simply traverse through the file and put each word to a dictionary's list according to its "anagram number"; traverce the map and for every pairs list with length more then 1 store it's contents in a single big anagram list.

UPDATE:

import operator

words = ["thore", "ganamar", "notanagram", "anagram", "other"]

letter_code = {'a':2, 'b':3, 'c':5, 'd':7, 'e':11, 'f':13, 'g':17, 'h':19, 'i':23, 'j':29, 'k':31, 'l':37, 'm':41, 'n':43, 
            'o':47, 'p':53, 'q':59, 'r':61, 's':67, 't':71, 'u':73, 'v':79, 'w':83, 'x':89, 'y':97, 'z':101}

def evaluate(word):
    return reduce( operator.mul, [letter_code[letter] for letter in word] )

anagram_map = {}
anagram_list = []
for word in words:
    anagram_number = evaluate(word)
    if anagram_number in anagram_map:
        anagram_map[ anagram_number ] += [word]
    else:
        anagram_map[ anagram_number ] = [word]

    if len(anagram_map[ anagram_number ]) == 2:
        anagram_list += anagram_map[ anagram_number ] 
    elif len(anagram_map[ anagram_number ]) > 2:
        anagram_list += [ word ]

print anagram_list

Of course the implementation can be optimized further. For instance, you don't really need a map of anagrams, just a counters would do fine. But I guess the code illustrates the idea best as it is.

like image 61
akalenuk Avatar answered Sep 30 '22 01:09

akalenuk


You can use "Tries".A trie (derived from retrieval) is a multi way search tree. Tries use pattern matching algorithms. It's basic use is to create spell check programs, but I think it can help your case.. Have a look at this link http://ww0.java4.datastructures.net/handouts/Tries.pdf

like image 44
Shashank Raghunath Avatar answered Sep 30 '22 02:09

Shashank Raghunath