This question refers to this problem on lintcode. I have a working solution, but it takes too long for the huge testcase. I am wondering how can it be improved? Maybe I can decrease the number of comparisons I make in the outer loop.
class Solution:
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
@staticmethod
def isanagram(s, t):
if len(s)!=len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
if chars[i]<0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
Update: Just to add, not looking for built-in pythonic solutions such as using Counter
which are already optimized. Have added Mike's suggestions, but still exceeding time-limit.
Skip strings you already placed in the set. Don't test them again.
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
# If both anagrams exist in set, there is no need to compare them.
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
You can also do a length comparison in your anagram test before iterating through the letters. Whenever the strings aren't the same length, they can't be anagrams anyway. Also, when a counter in chars
reaches -1 when comparing values in t, just return false. Don't iterate through chars
again.
@staticmethod
def isanagram(s, t):
# Test strings are the same length
if len(s) != len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
# If this is below 0, return false
if chars[i] < 0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
Instead of comparing all pairs of strings, you can just create a dictionary (or collections.defaultdict
) mapping each of the letter-counts to the words having those counts. For getting the letter-counts, you can use collections.Counter
. Afterwards, you just have to get the values from that dict. If you want all words that are anagrams of any other words, just merge the lists that have more than one entry.
strings = ["cat", "act", "rat", "hut", "tar", "tact"]
anagrams = defaultdict(list)
for s in strings:
anagrams[frozenset(Counter(s).items())].append(s)
print([v for v in anagrams.values()])
# [['hut'], ['rat', 'tar'], ['cat', 'act'], ['tact']]
print([x for v in anagrams.values() if len(v) > 1 for x in v])
# ['cat', 'act', 'rat', 'tar']
Of course, if you prefer not to use builtin functionality you can with just a few more lines just as well use a regular dict
instead of defaultdict
and write your own Counter
, similar to what you have in your isanagram
method, just without the comparison part.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With