Say I have a list of numbers with duplicants.
import random
lst = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
random.shuffle(lst)
I want to split the list into a minimum amount of sub"set"s with all unique numbers, without discarding any numbers. I managed to write the following code, but I feel like this is hard-coded, so there should be faster and more general solutions.
from collections import Counter
counter = Counter(lst)
maxcount = counter.most_common(1)[0][1]
res = []
while maxcount > 0:
res.append(set(x for x in lst if counter[x] >= maxcount))
maxcount -= 1
assert len([x for st in res for x in st]) == len(lst)
print(res)
Output:
[{4}, {8, 2, 4}, {0, 2, 3, 4, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}]
Obviously, this is only one of the solutions. Another solution could be
[{4, 9}, {8, 2, 4}, {0, 2, 3, 4, 7, 8}, {0, 1, 2, 3, 4, 5, 6, 7, 8}]
I want to find all possible solutions with minimum amount of sub"set"s (4 in this case). Note that same numbers are indistinguishable, e.g. [{1}, {1, 2}]
is the same solution as [{1, 2}, {1}]
for a list of [1, 2, 1]
.
Any suggestions?
This way takes time linear in the number of list elements, and its output is the same (same sets, in the same order) regardless of the input list's order. It's basically a more "eager" variation of your code:
def split(xs):
from collections import defaultdict
x2count = defaultdict(int)
result = []
for x in xs:
x2count[x] += 1
count = x2count[x]
if count > len(result):
result.append(set())
result[count - 1].add(x)
return result
Then, e.g.,
xs = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
import random
random.shuffle(xs)
print(split(xs))
displays
[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 2, 3, 4, 7, 8}, {8, 2, 4}, {4}]
Finding all answers is bound to be annoying ;-) But straightforward enough. Once you know there are 4 sets in the result, then you have a hairy kind of Cartesian product to compute. You know that, e.g., 7 appears twice, so there are comb(4, 2) == 6
ways to pick the two result sets 7 goes in. For each of those ways, you know, e.g., that 8 appears 3 times, so there are comb(4, 3) == 4
ways to pick the three results sets 8 goes in. Now we're up to 6 * 4 = 24 partial results. Repeat similarly for all other original integers. itertools.combinations()
can be used to do the choosing.
Unclear: consider input [1, 1, 2]
. The output here is [{1, 2}, {1}]
. Do you, or do you not, consider that to be the same as [{1}, {1, 2}]
? That is, do you consider an output to be a sequence of sets (in which case they're different), or as a set of sets (in which case they're the same)? A straightforward Cartesian product takes the "it's a sequence" view.
Here's a way. As sketched, it computes a Cartesian product of all ways of distributing each element across the number of required output sets. But rather than use itertools.product()
for this, it does it recursively, one element at a time. This allows it to check partial results so far for isomorphism, and decline to extend any partial solution that's isomorphic to a partial solution it already extended.
Toward that end, it views a partial solution as a set of sets. For technical reasons, Python requires using a frozenset
for a set that'a going to be used in turn as a set element.
Caution: this generator yields the same result
object every time. That's for efficiency. If you don't like that, you could, e.g., replace
yield result
with
yield result[:]
instead.
EDIT: note that I replaced the line
sig = frozenset(map(frozenset, result))
with
sig = frozenset(Counter(map(frozenset, result)).items())
That's because you really aren't viewing the result as a set of sets, but as a multiset of sets (a given set can appear more than once in a result, and the number of times it appears is significant). In fancier test cases than were given here, that can make a real difference.
A Counter
is the closest thing Python has to a builtin multiset type, but there is no "frozen" Counter
workalike akin to frozensets. So instead we turn the Counter
into a sequence of 2-tuples, and put those tuples into a frozenset. By using (set, count)
pairs, this allows us to account for that the number of times a set appears in a result is significant.
def allsplit(xs):
from collections import Counter
from itertools import combinations
c = Counter(xs)
n = max(c.values())
result = [set() for i in range(n)]
pairs = list(c.items())
pin = len(pairs)
def inner(pi):
if pi >= pin:
yield result
return
elt, count = pairs[pi]
seen = set()
for ixs in combinations(range(n), count):
for i in ixs:
result[i].add(elt)
sig = frozenset(Counter(map(frozenset, result)).items())
if sig not in seen:
yield from inner(pi + 1)
seen.add(sig)
for i in ixs:
result[i].remove(elt)
return inner(0)
Example:
>>> for x in allsplit([1, 1, 2, 3, 8, 4, 4]):
... print(x)
[{1, 2, 3, 4, 8}, {1, 4}]
[{1, 2, 3, 4}, {8, 1, 4}]
[{1, 2, 4, 8}, {1, 3, 4}]
[{1, 2, 4}, {1, 3, 4, 8}]
For your original example, it finds 36992 unique ways to partition the input.
I'd suggest using a prefilled list then store each value in separate buckets
import random
from collections import Counter
lst = [0, 0, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 8, 8, 9]
random.shuffle(lst)
c = Counter(lst)
maxcount = c.most_common(1)[0][1]
result = [set() for _ in range(maxcount)]
for k, v in c.items():
for i in range(v):
result[i].add(k)
print(result)
Can also be achieved with a defaultdict
c = Counter(lst)
result = defaultdict(set)
for k, v in c.items():
for i in range(v):
result[i].add(k)
result = list(result.values())
print(result)
Note on performance
from timeit import timeit
import numpy as np
lst = list(np.random.randint(0, 100, 10000))
nb = 1000
print(timeit(lambda: prefilled_list(lst), number=nb)) # 2.144
print(timeit(lambda: default_dict_set(lst), number=nb)) # 1.903
print(timeit(lambda: op_while_loop(lst), number=nb)) # 318.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With