I want to generate combinations that associate indices in a list with "slots". For instance,(0, 0, 1)
means that 0 and 1 belong to the same slot while 2 belongs to an other. (0, 1, 1, 1)
means that 1, 2, 3 belong to the same slot while 0 is by itself. In this example, 0 and 1 are just ways of identifying these slots but do not carry information for my usage.
Consequently, (0, 0, 0)
is absolutely identical to (1, 1, 1)
for my purposes, and (0, 0, 1)
is equivalent to (1, 1, 0)
.
The classical cartesian product generates a lot of these repetitions I'd like to get rid of.
This is what I obtain with itertools.product
:
>>> LEN, SIZE = (3,1)
>>> list(itertools.product(range(SIZE+1), repeat=LEN))
>>>
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]
And this is what I'd like to get:
>>> [(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1)]
It is easy with small lists but I don't quite see how to do this with bigger sets. Do you have a suggestion?
If it's unclear, please tell me so that I can clarify my question. Thank you!
Edit: based on Sneftel's answer, this function seems to work, but I don't know if it actually yields all the results:
def test():
for p in product(range(2), repeat=3):
j=-1
good = True
for k in p:
if k> j and (k-j) > 1:
good = False
elif k >j:
j = k
if good:
yield p
I would start by making the following observations:
These observations suggest the following algorithm:
def assignments(n, m, used=0):
"""Generate assignments of `n` items to `m` indistinguishable
buckets, where `used` buckets have been used so far.
>>> list(assignments(3, 1))
[(0, 0, 0)]
>>> list(assignments(3, 2))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]
>>> list(assignments(3, 3))
[(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2)]
"""
if n == 0:
yield ()
return
aa = list(assignments(n - 1, m, used))
for first in range(used):
for a in aa:
yield (first,) + a
if used < m:
for a in assignments(n - 1, m, used + 1):
yield (used,) + a
This handles your use case (12 items, 5 buckets) in a few seconds:
>>> from timeit import timeit
>>> timeit(lambda:list(assignments(12, 5)), number=1)
4.513746023178101
>>> sum(1 for _ in assignments(12, 5))
2079475
This is substantially faster than the function you give at the end of your answer (the one that calls product
and then drops the invalid assignments) would be if it were modified to handle the (12, 5) use case:
>>> timeit(lambda:list(test(12, 5)), number=1)
540.693009853363
Before checking for duplicates, you should harmonize the notation (assuming you don't want to set up some fancy AI): iterate through the lists and assign set-affiliation numbers for differing elements starting at 0, counting upwards. That is, you create a temporary dictionary per line that you are processing.
An exemplary output would be
(0,0,0) -> (0,0,0)
(0,1,0) -> (0,1,0)
but
(1,0,1) -> (0,1,0)
Removing the duplicates can then easily be performed as the problem is reduced to the problem of the solved question at Python : How to remove duplicate lists in a list of list?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With