Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get list of unique multi-sets

How can I uniquify the following list in Python:

all_the_ways = [(5,), (2, 2, 1), (2, 1, 2), (2, 1, 1, 1), (1, 2, 2),\
                (1, 2, 1, 1), (1, 1, 2, 1), (1, 1, 1, 2), (1, 1, 1, 1, 1)]

Desired output is:

[(5,), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1)]

i.e. I need to get rid of tuples that have the same set of numbers but in different order.

I tried

set(all_the_ways)

but it only transpose elements.

And when I do

list(map(set, all_the_ways))

the things getting only worse:

[{5}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 2}, {1}]

In other words I need to convert inner tuple to a collection that allows multiple equal items (set is not suitable) and for which permutations of elements don't change the collection itself (kinda like C++'s multiset)

like image 541
tsionyx Avatar asked Apr 29 '14 05:04

tsionyx


3 Answers

How about this:

list(set(tuple(sorted(s)) for s in all_the_ways))

Output:

[(1, 2, 2), (5,), (1, 1, 1, 1, 1), (1, 1, 1, 2)]

It will mangle the order of each tuple though. I'm assuming that doesn't matter, as tuples containing the same set of numbers are considered the same for your case. What this implies is that in the end, the output list might contain tuples that are not among the original input, for example (credit to @thefourtheye):

all_the_ways = [(2, 1, 2), (2, 2, 1)]
# Output: [(1, 2, 2)]

This may or may not be a problem, and if it is, you can use the more robust solutions which are already mentioned in the other excellent answers.

like image 194
YS-L Avatar answered Oct 21 '22 19:10

YS-L


Use collections.Counter() to identify the unique multisets:

>>> from collections import Counter

>>> all_the_ways = [(5,), (2, 2, 1), (2, 1, 2), (2, 1, 1, 1), (1, 2, 2),\
                (1, 2, 1, 1), (1, 1, 2, 1), (1, 1, 1, 2), (1, 1, 1, 1, 1)]
>>> result = []
>>> seen = set()
>>> for tup in all_the_ways:
        key = tuple(sorted(Counter(tup).items())) # unique signature
        if key not in seen:
            result.append(tup)
        seen.add(key)

>>> result
[(5,), (2, 2, 1), (2, 1, 1, 1), (1, 1, 1, 1, 1)]
like image 3
Raymond Hettinger Avatar answered Oct 21 '22 20:10

Raymond Hettinger


If the Order doesn't matter you can use this

from collections import Counter
>>> {frozenset(Counter(tup).items()):tup for tup in data}.values()
# [(1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1), (5,)]

If you want to maintain the Order,

from collections import Counter, OrderedDict
OrderedDict([frozenset(Counter(tup).items()),tup] for tup in data).values()
# [(5,), (1, 2, 2), (1, 1, 1, 2), (1, 1, 1, 1, 1)]

In both the solutions we rely on frozenset, because the set objects are not hashable as they are mutable. In the first case, we construct a dictionary with the frequency of the numbers (determined with Counter) as the key and the current tuple as the value corresponding to that. Once the dictionary construction is completed, we take all the values, which correspond to the tuples.

In the second case, we simply use OrderedDict to maintain the order.

like image 2
thefourtheye Avatar answered Oct 21 '22 19:10

thefourtheye