Using Python, find anagrams for a list of words

Question

Suppose I have a list of strings like ["car", "tree", "boy", "girl", "arc"] etc. I want to find groups of anagrams in that list - in this case, (car, arc).

I tried writing code to loop over the list and compare pairs of strings, but how do I account for the fact that the letters can be in a different order?

_{For the specific case of checking whether a single pair of strings are anagrams of each other, see Checking strings against each other (Anagrams).}

Ofir Farchy · Accepted Answer

In order to do this for 2 strings you can do this:

def isAnagram(str1, str2):
    str1_list = list(str1)
    str1_list.sort()
    str2_list = list(str2)
    str2_list.sort()

    return (str1_list == str2_list)

As for the iteration on the list, it is pretty straight forward

hughdbrown · Answer

Create a dictionary of (sorted word, list of word). All the words that are in the same list are anagrams of each other.

from collections import defaultdict

def load_words(filename='/usr/share/dict/american-english'):
    with open(filename) as f:
        for word in f:
            yield word.rstrip()

def get_anagrams(source):
    d = defaultdict(list)
    for word in source:
        key = "".join(sorted(word))
        d[key].append(word)
    return d

def print_anagrams(word_source):
    d = get_anagrams(word_source)
    for key, anagrams in d.iteritems():
        if len(anagrams) > 1:
            print(key, anagrams)

word_source = load_words()
print_anagrams(word_source)

Or:

word_source = ["car", "tree", "boy", "girl", "arc"]
print_anagrams(word_source)

Felix Loether · Answer

One solution is to sort the word you're searching anagrams for (for example using sorted), sort the alternative and compare those.

So if you would be searching for anagrams of 'rac' in the list ['car', 'girl', 'tofu', 'rca'], your code could look like this:

word = sorted('rac')
alternatives = ['car', 'girl', 'tofu', 'rca']

for alt in alternatives:
    if word == sorted(alt):
        print alt

Alexander Zhukov · Answer

There are multiple solutions to this problem:

Classic approach

First, let's consider what defines an anagram: two words are anagrams of each other if they consist of the same set of letters and each letter appears exactly the same number or time in both words. This is basically a histogram of letters count of each word. This is a perfect use case for collections.Counter data structure (see docs). The algorithms is as follows:
- Build a dictionary where keys would be histograms and values would be lists of words that have this histogram.
- For each word build it's histogram and add it to the list that corresponds to this histogram.
- Output list of dictionary values.
Here is the code:
```
from collections import Counter, defaultdict

def anagram(words):
    anagrams = defaultdict(list)
    for word in words:
        histogram = tuple(Counter(word).items()) # build a hashable histogram
        anagrams[histogram].append(word)
    return list(anagrams.values())

keywords = ("hi", "hello", "bye", "helol", "abc", "cab", 
                "bac", "silenced", "licensed", "declines")

print(anagram(keywords))
```
Note that constructing Counter is O(l), while sorting each word is O(n*log(l)) where l is the length of the word.
Solving anagrams using prime numbers

This is a more advanced solution, that relies on the "multiplicative uniqueness" of prime numbers. You can refer to this SO post: Comparing anagrams using prime numbers, and here is a sample python implementation.

Michael James Kali Galarnyk · Answer

Since you can't import anything, here are two different approaches including the for loop you asked for.

Approach 1: For Loops and Inbuilt Sorted Function

word_list = ["percussion", "supersonic", "car", "tree", "boy", "girl", "arc"]

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (sorted(word_1)==sorted(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

Approach 2: Dictionaries

def freq(word):
    freq_dict = {}
    for char in word:
        freq_dict[char] = freq_dict.get(char, 0) + 1
    return freq_dict

# initialize a list
anagram_list = []
for word_1 in word_list: 
    for word_2 in word_list: 
        if word_1 != word_2 and (freq(word_1) == freq(word_2)):
            anagram_list.append(word_1)
print(anagram_list)

If you want these approaches explained in more detail, here is an article.

Christian Alis · Answer

Sort each element then look for duplicates. There's a built-in function for sorting so you do not need to import anything

kinjal patel · Answer

def findanagranfromlistofwords(li):
    dict = {}
    index=0
    for i in range(0,len(li)):
        originalfirst = li[index]
        sortedfirst = ''.join(sorted(str(li[index])))
        for j in range(index+1,len(li)):
            next = ''.join(sorted(str(li[j])))
            print next
            if sortedfirst == next:
                dict.update({originalfirst:li[j]})
                print "dict = ",dict
        index+=1

    print dict

findanagranfromlistofwords(["car", "tree", "boy", "girl", "arc"])

grepit · Answer

Most of previous answers are correct, here is another way to compare two strings. The main benefit of using this strategy versus sort is space/time complexity which is n log of n.

1.Check the length of string

2.Build frequency Dictionary and compare if they both match then we have successfully identified anagram words

def char_frequency(word):
    frequency  = {}
    for char in word:
        #if character  is in frequency then increment the value
        if char in frequency:
            frequency[char] += 1
        #else add character and set it to 1
        else:
            frequency[char] = 1
    return frequency 


a_word ='google'
b_word ='ooggle'
#check length of the words 
if (len(a_word) != len(b_word)):
   print ("not anagram")
else:
    #here we check the frequecy to see if we get the same
    if ( char_frequency(a_word) == char_frequency(b_word)):
        print("found anagram")
    else:
        print("no anagram")

Using Python, find anagrams for a list of words

Tags:

python

anagram

user1040563

8 Answers

Ofir Farchy

hughdbrown

Felix Loether

Alexander Zhukov

Michael James Kali Galarnyk

Christian Alis

kinjal patel

grepit

Recent Activity

Donate For Us

Using Python, find anagrams for a list of words

Tags:

python

anagram

user1040563

8 Answers

Ofir Farchy

hughdbrown

Felix Loether

Alexander Zhukov

Michael James Kali Galarnyk

Christian Alis

kinjal patel

grepit

Related questions

Recent Activity

Donate For Us