I am trying to use itertools.permutations() to return all the permutations of the string and return only the ones which are members of a set of words.
import itertools
def permutations_in_dict(string, words):
'''
Parameters
----------
string : {str}
words : {set}
Returns
-------
list : {list} of {str}
Example
-------
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
'''
My current solution works fine in terminal but somehow couldn't pass the test case...
return list(set([''.join(p) for p in itertools.permutations(string)]) & words)
Any help will be appreciated.
The problem you're solving is best described as testing for anagram matches.
The traditional solution is to sort the target string, sort the candidate string, and test for equality.
>>> def permutations_in_dict(string, words):
target = sorted(string)
return sorted(word for word in words if sorted(word) == target)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Another approach is to use collections.Counter() to make a multiset equality test. This is algorithmically superior to the sort solution (O(n)
versus O(n log n)
) but tends to lose unless the size of the strings is large (due to the cost of hashing all the characters).
>>> def permutations_in_dict(string, words):
target = Counter(string)
return sorted(word for word in words if Counter(word) == target)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
A unique anagram signature or perfect hash can be constructed by multiplying prime numbers corresponding to each possible character in a string.
The commutative property of multiplication guarantees that the hash value will be invariant for any permutation of a single string. The uniqueness of the hash value is guaranteed by the fundamental theorem of arithmetic (also known as the unique prime factorization theorem).
>>> from operator import mul
>>> primes = [2, 3, 5, 7, 11]
>>> primes += [p for p in range(13, 1620) if all(pow(b, p-1, p) == 1 for b in (5, 11))]
>>> anagram_hash = lambda s: reduce(mul, (primes[ord(c)] for c in s))
>>> def permutations_in_dict(string, words):
target = anagram_hash(string)
return sorted(word for word in words if anagram_hash(word) == target)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Searching by permutations on the target string using itertools.permutations() is reasonable when the string is small (generating permutations on a n length string generates n factorial candidates).
The good news is that when n is small and the number of words is large, this approach runs very fast (because set membership testing is O(1)):
>>> from itertools import permutations
>>> def permutations_in_dict(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(word for word in words if word in perms)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
As the OP surmised, the pure python search loop can be sped-up to c-speed by using set.intersection():
>>> def permutations_in_dict(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(words & perms)
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['act', 'cat']
Which solution is best depends on the length of string and the length of words. Timings will show which is best for a particular problem.
Here are some comparative timings for the various approaches using two different string sizes:
Timings with string_size=5 and words_size=1000000
-------------------------------------------------
0.01406 match_sort
0.06827 match_multiset
0.02167 match_perfect_hash
0.00224 match_permutations
0.00013 match_permutations_set
Timings with string_size=20 and words_size=1000000
--------------------------------------------------
2.19771 match_sort
8.38644 match_multiset
4.22723 match_perfect_hash
<takes "forever"> match_permutations
<takes "forever"> match_permutations_set
The results indicate that for small strings, the fastest approach searches permutations on the target string using set-intersection.
For larger strings, the fastest approach is the traditional sort-and-compare solution.
Hope you found this little algorithmic study as interesting as I have. The take-aways are:
FWIW, here is a test set-up I used to run the comparative timings:
from collections import Counter
from itertools import permutations
from string import letters
from random import choice
from operator import mul
from time import time
def match_sort(string, words):
target = sorted(string)
return sorted(word for word in words if sorted(word) == target)
def match_multiset(string, words):
target = Counter(string)
return sorted(word for word in words if Counter(word) == target)
primes = [2, 3, 5, 7, 11]
primes += [p for p in range(13, 1620) if all(pow(b, p-1, p) == 1 for b in (5, 11))]
anagram_hash = lambda s: reduce(mul, (primes[ord(c)] for c in s))
def match_perfect_hash(string, words):
target = anagram_hash(string)
return sorted(word for word in words if anagram_hash(word) == target)
def match_permutations(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(word for word in words if word in perms)
def match_permutations_set(string, words):
perms = set(map(''.join, permutations(string)))
return sorted(words & perms)
string_size = 5
words_size = 1000000
population = letters[: string_size+2]
words = set()
for i in range(words_size):
word = ''.join([choice(population) for i in range(string_size)])
words.add(word)
string = word # Arbitrarily search use the last word as the target
print 'Timings with string_size=%d and words_size=%d' % (string_size, words_size)
for func in (match_sort, match_multiset, match_perfect_hash, match_permutations, match_permutations_set):
start = time()
func(string, words)
end = time()
print '%-10.5f %s' % (end - start, func.__name__)
You can simply use collections.Counter()
to compare the words
to the string
without creating all permutations
(this explodes with length of string):
from collections import Counter
def permutations_in_dict(string, words):
c = Counter(string)
return [w for w in words if c == Counter(w)]
>>> permutations_in_dict('act', {'cat', 'rat', 'dog', 'act'})
['cat', 'act']
Note: set
s are unordered so if you need a specific order you may need to sort the result, e.g. return sorted(...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With