Efficient combinations of N colored elements with restriction in the number of colors

Question

Given a set of N elements colored with C colors, how can I find every possible combination of length L that contains no more than a maximum of M colors?

I tried this algorithm that uses itertools.combinations to generate all the possible combinations, and then filter out those that do not hold the maximum colors condiction.

from itertools import combinations as cb

def allowed_combinations(elements, combination_size=4, max_colors=3):

    colors = set([c for k, c in elements.items()])
    combinations = cb(elements, combination_size)
    for combination in combinations:
        colors = set([elements[element] for element in combination])
        if len(colors) > max_colors:
            continue
        yield combination


elements = dict()
elements['A'] = 'red'
elements['B'] = 'red'
elements['C'] = 'blue'
elements['D'] = 'blue'
elements['E'] = 'green'
elements['F'] = 'green'
elements['G'] = 'green'
elements['H'] = 'yellow'
elements['I'] = 'white'
elements['J'] = 'white'
elements['K'] = 'black'

combinations = allowed_combinations(elements)

for c in combinations:
    for element in c:
        print("%s-%s" % (element, elements[element]))
    print "
"

the output is like:

A-red
C-blue
B-red
E-green


A-red
C-blue
B-red
D-blue


A-red
C-blue
B-red
G-green


A-red
C-blue
B-red
F-green

...

The problem is that generating all possible combinations can be computationally very expensive. In my case, for instance, L is often 6 and the number of elements N is around 50, so it gives us Bin(50,6) = 15890700 possible combinations. If maximum number of colors allowed in a comination is small, most of combinations are "useless" and so they are discarded in the filter step. My intuition is that I should put the filtering step inside/before the combinatory step, to avoid the explotion of combinations, but I don't see how.

Gareth Rees · Accepted Answer

Here's an implementation that's a bit simpler than the other answers posted so far. The basic approach is to:

Pick a value ("colour" in your terminology) that has not been picked so far;
Loop over i, the number of keys ("elements") associated with that value that will be included in the output;
Loop over c, the combinations of those keys of length i;
Recurse to pick the next value.

from collections import defaultdict, deque
from itertools import combinations

def constrained_combinations(elements, r, s):
    """Generate distinct combinations of 'r' keys from the dictionary
    'elements' using at most 's' different values. The values must be
    hashable.

        >>> from collections import OrderedDict
        >>> elements = OrderedDict(enumerate('aabbc'))
        >>> cc = constrained_combinations
        >>> list(cc(elements, 2, 1))
        [(0, 1), (2, 3)]
        >>> list(cc(elements, 3, 2))
        [(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (1, 2, 3), (2, 3, 4)]
        >>> list(cc(elements, 3, 3)) == list(combinations(range(5), 3))
        True
        >>> sum(1 for _ in cc(OrderedDict(enumerate('aabbcccdeef')), 4, 3))
        188

    """
    # 'value_keys' is a map from value to a list of keys associated
    # with that value; 'values' is a list of values in reverse order of
    # first appearance.
    value_keys = defaultdict(list)
    values = deque()
    for k, v in elements.items():
        if v not in value_keys:
            values.appendleft(v)
        value_keys[v].append(k)

    def helper(current, r, s):
        if r == 0:
            yield current
            return
        if s == 0 or not values:
            return
        value = values.pop()
        keys = value_keys[value]
        for i in range(min(r, len(keys)), -1, -1):
            for c in combinations(keys, i):
                for result in helper(current + c, r - i, s - min(i, 1)):
                    yield result
        values.append(value)

    return helper((), r, s)

Notes

In Python 3.3 or later, you could use the yield from statement to simplify the recursive call:
```
yield from helper(current + c, r - i, s - min(i, 1))
```
If you're wondering why the doctests use collections.OrderedDict, it's so that the combinations can be returned in a predictable order, which is necessary for the tests to work.
The code reverses the list values, and iterates downwards over i so that if the caller passes in an OrderedDict, the combinations are returned in a sensible order (with values that appear early in the input also appearing early in the output).
Given the slight awkwardness in getting predictable output from this function, it would, I think, be worth considering changing the interface so that instead of taking a dictionary mapping keys to values, it would take an iterable of (key, value) pairs.

Performance

This is broadly similar in speed to Tim Peter's combs2:

>>> from timeit import timeit
>>> elements = dict(enumerate('abcde' * 10))
>>> test = lambda f:timeit(lambda:sum(1 for _ in f(elements, 6, 3)), number=1)
>>> test(combs2)
11.403807007009163
>>> test(constrained_combinations)
11.38378801709041

Efficient combinations of N colored elements with restriction in the number of colors

Tags:

python

algorithm

combinatorics

combinations

alberto

1 Answers

Notes

Performance

Gareth Rees

Recent Activity

Donate For Us

Efficient combinations of N colored elements with restriction in the number of colors

Tags:

python

algorithm

combinatorics

combinations

alberto

1 Answers

Notes

Performance

Gareth Rees

Related questions

Recent Activity

Donate For Us