How can I uniquify the following list in Python:
all_the_ways = [(5,), (2, 2, 1), (2, 1, 2), (2, 1, 1, 1), (1, 2, 2),\
(1, 2, 1, 1), (1, 1, 2, 1), (1, 1, 1, 2), (1, 1, 1, 1, 1)]
Desired output is:
[(5,), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1)]
i.e. I need to get rid of tuples that have the same set of numbers but in different order.
I tried
set(all_the_ways)
but it only transpose elements.
And when I do
list(map(set, all_the_ways))
the things getting only worse:
[{5}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1}]
In other words I need to convert inner tuple to a collection that allows multiple equal items (set
is not suitable) and for which permutations of elements don't change the collection itself (kinda like C++'s multiset
)
How about this:
list(set(tuple(sorted(s)) for s in all_the_ways))
Output:
[(1, 2, 2), (5,), (1, 1, 1, 1, 1), (1, 1, 1, 2)]
It will mangle the order of each tuple though. I'm assuming that doesn't matter, as tuples containing the same set of numbers are considered the same for your case. What this implies is that in the end, the output list might contain tuples that are not among the original input, for example (credit to @thefourtheye):
all_the_ways = [(2, 1, 2), (2, 2, 1)]
# Output: [(1, 2, 2)]
This may or may not be a problem, and if it is, you can use the more robust solutions which are already mentioned in the other excellent answers.
Use collections.Counter() to identify the unique multisets:
>>> from collections import Counter
>>> all_the_ways = [(5,), (2, 2, 1), (2, 1, 2), (2, 1, 1, 1), (1, 2, 2),\
(1, 2, 1, 1), (1, 1, 2, 1), (1, 1, 1, 2), (1, 1, 1, 1, 1)]
>>> result = []
>>> seen = set()
>>> for tup in all_the_ways:
key = tuple(sorted(Counter(tup).items())) # unique signature
if key not in seen:
result.append(tup)
seen.add(key)
>>> result
[(5,), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1)]
If the Order doesn't matter you can use this
from collections import Counter
>>> {frozenset(Counter(tup).items()):tup for tup in data}.values()
# [(1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1), (5,)]
If you want to maintain the Order,
from collections import Counter, OrderedDict
OrderedDict([frozenset(Counter(tup).items()),tup] for tup in data).values()
# [(5,), (1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1)]
In both the solutions we rely on frozenset
, because the set
objects are not hashable as they are mutable. In the first case, we construct a dictionary with the frequency of the numbers (determined with Counter
) as the key and the current tuple as the value corresponding to that. Once the dictionary construction is completed, we take all the values, which correspond to the tuples.
In the second case, we simply use OrderedDict
to maintain the order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With