Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find group of strings that are anagrams

This question refers to this problem on lintcode. I have a working solution, but it takes too long for the huge testcase. I am wondering how can it be improved? Maybe I can decrease the number of comparisons I make in the outer loop.

class Solution:
    # @param strs: A list of strings
    # @return: A list of strings
    def anagrams(self, strs):
        # write your code here
        ret=set()
        for i in range(0,len(strs)):
            for j in range(i+1,len(strs)):
                if i in ret and j in ret:
                    continue
                if Solution.isanagram(strs[i],strs[j]):
                    ret.add(i)
                    ret.add(j)

        return [strs[i] for i in list(ret)]


    @staticmethod
    def isanagram(s, t):
        if len(s)!=len(t):
            return False
        chars={}
        for i in s:
            if i in chars:
                chars[i]+=1
            else:
                chars[i]=1

        for i in t:
            if i not in chars:
                return False
            else:
                chars[i]-=1
                if chars[i]<0:
                    return False

        for i in chars:
            if chars[i]!=0:
                return False
        return True

Update: Just to add, not looking for built-in pythonic solutions such as using Counter which are already optimized. Have added Mike's suggestions, but still exceeding time-limit.

like image 617
Wajahat Avatar asked Mar 12 '23 04:03

Wajahat


2 Answers

Skip strings you already placed in the set. Don't test them again.

# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
    # write your code here
    ret=set()
    for i in range(0,len(strs)):
        for j in range(i+1,len(strs)):

            # If both anagrams exist in set, there is no need to compare them.
            if i in ret and j in ret:
                continue

            if Solution.isanagram(strs[i],strs[j]):
                ret.add(i)
                ret.add(j)

    return [strs[i] for i in list(ret)]

You can also do a length comparison in your anagram test before iterating through the letters. Whenever the strings aren't the same length, they can't be anagrams anyway. Also, when a counter in chars reaches -1 when comparing values in t, just return false. Don't iterate through chars again.

@staticmethod
def isanagram(s, t):
    # Test strings are the same length
    if len(s) != len(t):
        return False

    chars={}
    for i in s:
        if i in chars:
            chars[i]+=1
        else:
            chars[i]=1

    for i in t:
        if i not in chars:
            return False
        else:
            chars[i]-=1
            # If this is below 0, return false
            if chars[i] < 0:
                return False

    for i in chars:
        if chars[i]!=0:
            return False
    return True
like image 100
Mike Avatar answered Mar 30 '23 11:03

Mike


Instead of comparing all pairs of strings, you can just create a dictionary (or collections.defaultdict) mapping each of the letter-counts to the words having those counts. For getting the letter-counts, you can use collections.Counter. Afterwards, you just have to get the values from that dict. If you want all words that are anagrams of any other words, just merge the lists that have more than one entry.

strings = ["cat", "act", "rat", "hut", "tar", "tact"]
anagrams = defaultdict(list)

for s in strings:
    anagrams[frozenset(Counter(s).items())].append(s)

print([v for v in anagrams.values()])
# [['hut'], ['rat', 'tar'], ['cat', 'act'], ['tact']]
print([x for v in anagrams.values() if len(v) > 1 for x in v])
# ['cat', 'act', 'rat', 'tar']

Of course, if you prefer not to use builtin functionality you can with just a few more lines just as well use a regular dict instead of defaultdict and write your own Counter, similar to what you have in your isanagram method, just without the comparison part.

like image 37
tobias_k Avatar answered Mar 30 '23 11:03

tobias_k