Given a set of N elements colored with C colors, how can I find every possible combination of length L that contains no more than a maximum of M colors?
I tried this algorithm that uses itertools.combinations to generate all the possible combinations, and then filter out those that do not hold the maximum colors condiction.
from itertools import combinations as cb
def allowed_combinations(elements, combination_size=4, max_colors=3):
colors = set([c for k, c in elements.items()])
combinations = cb(elements, combination_size)
for combination in combinations:
colors = set([elements[element] for element in combination])
if len(colors) > max_colors:
continue
yield combination
elements = dict()
elements['A'] = 'red'
elements['B'] = 'red'
elements['C'] = 'blue'
elements['D'] = 'blue'
elements['E'] = 'green'
elements['F'] = 'green'
elements['G'] = 'green'
elements['H'] = 'yellow'
elements['I'] = 'white'
elements['J'] = 'white'
elements['K'] = 'black'
combinations = allowed_combinations(elements)
for c in combinations:
for element in c:
print("%s-%s" % (element, elements[element]))
print "\n"
the output is like:
A-red
C-blue
B-red
E-green
A-red
C-blue
B-red
D-blue
A-red
C-blue
B-red
G-green
A-red
C-blue
B-red
F-green
...
The problem is that generating all possible combinations can be computationally very expensive. In my case, for instance, L is often 6 and the number of elements N is around 50, so it gives us Bin(50,6) = 15890700 possible combinations. If maximum number of colors allowed in a comination is small, most of combinations are "useless" and so they are discarded in the filter step. My intuition is that I should put the filtering step inside/before the combinatory step, to avoid the explotion of combinations, but I don't see how.
Here's an implementation that's a bit simpler than the other answers posted so far. The basic approach is to:
i
, the number of keys ("elements") associated with that value that will be included in the output;c
, the combinations of those keys of length i
;from collections import defaultdict, deque
from itertools import combinations
def constrained_combinations(elements, r, s):
"""Generate distinct combinations of 'r' keys from the dictionary
'elements' using at most 's' different values. The values must be
hashable.
>>> from collections import OrderedDict
>>> elements = OrderedDict(enumerate('aabbc'))
>>> cc = constrained_combinations
>>> list(cc(elements, 2, 1))
[(0, 1), (2, 3)]
>>> list(cc(elements, 3, 2))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (1, 2, 3), (2, 3, 4)]
>>> list(cc(elements, 3, 3)) == list(combinations(range(5), 3))
True
>>> sum(1 for _ in cc(OrderedDict(enumerate('aabbcccdeef')), 4, 3))
188
"""
# 'value_keys' is a map from value to a list of keys associated
# with that value; 'values' is a list of values in reverse order of
# first appearance.
value_keys = defaultdict(list)
values = deque()
for k, v in elements.items():
if v not in value_keys:
values.appendleft(v)
value_keys[v].append(k)
def helper(current, r, s):
if r == 0:
yield current
return
if s == 0 or not values:
return
value = values.pop()
keys = value_keys[value]
for i in range(min(r, len(keys)), -1, -1):
for c in combinations(keys, i):
for result in helper(current + c, r - i, s - min(i, 1)):
yield result
values.append(value)
return helper((), r, s)
In Python 3.3 or later, you could use the yield from
statement to simplify the recursive call:
yield from helper(current + c, r - i, s - min(i, 1))
If you're wondering why the doctests use collections.OrderedDict
, it's so that the combinations can be returned in a predictable order, which is necessary for the tests to work.
The code reverses the list values
, and iterates downwards over i
so that if the caller passes in an OrderedDict
, the combinations are returned in a sensible order (with values that appear early in the input also appearing early in the output).
Given the slight awkwardness in getting predictable output from this function, it would, I think, be worth considering changing the interface so that instead of taking a dictionary mapping keys to values, it would take an iterable of (key, value) pairs.
This is broadly similar in speed to Tim Peter's combs2
:
>>> from timeit import timeit
>>> elements = dict(enumerate('abcde' * 10))
>>> test = lambda f:timeit(lambda:sum(1 for _ in f(elements, 6, 3)), number=1)
>>> test(combs2)
11.403807007009163
>>> test(constrained_combinations)
11.38378801709041
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With