I'm trying to find a performant solution in Python that works like so:
>>> func([1,2,3], [1,2])
[(1,1), (1,2), (1,3), (2,2), (2,3)]
This is similar to itertools.combinations_with_replacement, except that it can take multiple iterables. It's also similar to itertools.product, except that it omits order-independent duplicate results.
All of the inputs will be prefixes of the same series (i.e. they all start with the same element and follow the same pattern, but might have different lengths).
The function must be able to take any number of iterables as input.
Given a set of lists A, B, C, ..., here is a sketch of an algorithm that generates those results.
assert len(A) <= len(B) <= len(C) <= ...
for i in 0..len(A)
for j in i..len(B)
for k in j..len(C)
.
.
.
yield A[i], B[j], C[k], ...
itertools.product and filter the results. This has to be performant.itertools.product and filtering for a reasonable number of iterables.I suspect there's a way to do this with itertools, but I have no idea what it is.
EDIT: I'm looking for the solution that takes the least time.
EDIT 2: There seems to be some confusion about what I'm trying to optimize. I'll illustrate with an example.
>>> len(list(itertools.product( *[range(8)] * 5 )))
32768
>>> len(list(itertools.combinations_with_replacement(range(8), 5)))
792
The first line gives the number of order-dependent possibilities for rolling 5 8-sided dice. The second gives the number of order-independent possibilities. Regardless of how performant itertools.product is, it'll take 2 orders of magnitude more iterations to get a result than itertools.combinations_with_replacement. I'm trying to find a way to do something similar to itertools.combinations_with_replacement, but with multiple iterables that minimizes the number of iterations, or time performance. (product runs in whereas
combinations_with_replacement runs in , where M is the number of sides on the die and N is the number of dice)
This solution hasn't recursion or filtering. It's trying to produce only ascending sequences of indices so it's usable only for prefixes of same collection. Also it's uses only indices for element identification so it's not enforces elements of series to be comparable or even hashable.
def prefixCombinations(coll,prefixes):
"produces combinations of elements of the same collection prefixes"
prefixes = sorted(prefixes) # does not impact result through it's unordered combinations
n = len(prefixes)
indices = [0]*n
while True:
yield tuple(coll[indices[i]] for i in range(n))
#searching backwards for non-maximum index
for i in range(n-1,-1,-1):
if indices[i] < prefixes[i] - 1 : break
# if all indices hits maximum - leave
else: break
level = indices[i] + 1
for i in range(i,n): indices[i] = level
examples are
>>> list(prefixCombinations([1,2,3,4,5], (3,2)))
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3]]
>>> list(prefixCombinations([1,2,3,4,5], (3,2,5)))
[[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 1, 4], [1, 1, 5], [1, 2, 2], [1, 2, 3], [1, 2, 4], [1, 2, 5], [1, 3, 3], [1, 3, 4], [1, 3, 5], [2, 2, 2], [2, 2, 3], [2, 2, 4], [2, 2, 5], [2, 3, 3], [2, 3, 4], [2, 3, 5]]
>>> from itertools import combinations_with_replacement
>>> tuple(prefixCombinations(range(10),[10]*4)) == tuple(combinations_with_replacement(range(10),4))
True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With