I'm checking if two strings a
and b
are permutations of each other, and I'm wondering what the ideal way to do this is in Python. From the Zen of Python, "There should be one -- and preferably only one -- obvious way to do it," but I see there are at least two ways:
sorted(a) == sorted(b)
and
all(a.count(char) == b.count(char) for char in a)
but the first one is slower when (for example) the first char of a
is nowhere in b
, and the second is slower when they are actually permutations.
Is there any better (either in the sense of more Pythonic, or in the sense of faster on average) way to do it? Or should I just choose from these two depending on which situation I expect to be most common?
For example, the string ABC has 6 permutations, i.e., ABC, ACB, BAC, BCA, CBA, CAB . In Python, we can use the built-in module itertools to get permutations of elements in the list using the permutations() function. However, we can also write your utility function to generate all permutations of a string.
We can find the count without finding all permutation. Idea is to find all the characters that is getting repeated, i.e., frequency of all the character. Then, we divide the factorial of the length of string by multiplication of factorial of frequency of characters.
Here is a way which is O(n), asymptotically better than the two ways you suggest.
import collections def same_permutation(a, b): d = collections.defaultdict(int) for x in a: d[x] += 1 for x in b: d[x] -= 1 return not any(d.itervalues()) ## same_permutation([1,2,3],[2,3,1]) #. True ## same_permutation([1,2,3],[2,3,1,1]) #. False
"but the first one is slower when (for example) the first char of a is nowhere in b".
This kind of degenerate-case performance analysis is not a good idea. It's a rat-hole of lost time thinking up all kinds of obscure special cases.
Only do the O-style "overall" analysis.
Overall, the sorts are O( n log( n ) ).
The a.count(char) for char in a
solution is O( n 2 ). Each count pass is a full examination of the string.
If some obscure special case happens to be faster -- or slower, that's possibly interesting. But it only matters when you know the frequency of your obscure special cases. When analyzing sort algorithms, it's important to note that a fair number of sorts involve data that's already in the proper order (either by luck or by a clever design), so sort performance on pre-sorted data matters.
In your obscure special case ("the first char of a is nowhere in b") is this frequent enough to matter? If it's just a special case you thought of, set it aside. If it's a fact about your data, then consider it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With