Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating a list of repetitions regardless of the order

I want to generate combinations that associate indices in a list with "slots". For instance,(0, 0, 1) means that 0 and 1 belong to the same slot while 2 belongs to an other. (0, 1, 1, 1) means that 1, 2, 3 belong to the same slot while 0 is by itself. In this example, 0 and 1 are just ways of identifying these slots but do not carry information for my usage.

Consequently, (0, 0, 0) is absolutely identical to (1, 1, 1) for my purposes, and (0, 0, 1) is equivalent to (1, 1, 0).

The classical cartesian product generates a lot of these repetitions I'd like to get rid of.

This is what I obtain with itertools.product :

>>> LEN, SIZE = (3,1)
>>> list(itertools.product(range(SIZE+1), repeat=LEN))
>>>
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]

And this is what I'd like to get:

>>> [(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1)]

It is easy with small lists but I don't quite see how to do this with bigger sets. Do you have a suggestion?

If it's unclear, please tell me so that I can clarify my question. Thank you!

Edit: based on Sneftel's answer, this function seems to work, but I don't know if it actually yields all the results:

def test():
    for p in product(range(2), repeat=3):
        j=-1
        good = True
        for k in p:
            if k> j and (k-j) > 1:
                good = False
            elif k >j:
                j = k
        if good:
            yield p
like image 512
user3057865 Avatar asked Oct 02 '22 06:10

user3057865


2 Answers

I would start by making the following observations:

  1. The first element of each combination must be 0.
  2. The second element must be 0 or 1.
  3. The third element must be 0, 1 or 2, but it can only be 2 if the second element was 1.

These observations suggest the following algorithm:

def assignments(n, m, used=0):
    """Generate assignments of `n` items to `m` indistinguishable
    buckets, where `used` buckets have been used so far.

        >>> list(assignments(3, 1))
        [(0, 0, 0)]
        >>> list(assignments(3, 2))
        [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1)]
        >>> list(assignments(3, 3))
        [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2)]

    """
    if n == 0:
        yield ()
        return
    aa = list(assignments(n - 1, m, used))
    for first in range(used):
        for a in aa:
            yield (first,) + a
    if used < m:
        for a in assignments(n - 1, m, used + 1):
            yield (used,) + a

This handles your use case (12 items, 5 buckets) in a few seconds:

>>> from timeit import timeit
>>> timeit(lambda:list(assignments(12, 5)), number=1)
4.513746023178101
>>> sum(1 for _ in assignments(12, 5))
2079475

This is substantially faster than the function you give at the end of your answer (the one that calls product and then drops the invalid assignments) would be if it were modified to handle the (12, 5) use case:

>>> timeit(lambda:list(test(12, 5)), number=1)
540.693009853363
like image 176
Gareth Rees Avatar answered Oct 05 '22 15:10

Gareth Rees


Before checking for duplicates, you should harmonize the notation (assuming you don't want to set up some fancy AI): iterate through the lists and assign set-affiliation numbers for differing elements starting at 0, counting upwards. That is, you create a temporary dictionary per line that you are processing.

An exemplary output would be

(0,0,0) -> (0,0,0)
(0,1,0) -> (0,1,0)

but

(1,0,1) -> (0,1,0)

Removing the duplicates can then easily be performed as the problem is reduced to the problem of the solved question at Python : How to remove duplicate lists in a list of list?

like image 33
fuesika Avatar answered Oct 05 '22 15:10

fuesika