I have a huge list of integers in Python (1000000+ elements), but I will illustrate what I need with an example for the sake of simplicity. Let's suppose I have this list:
A = [1,2,3,4,100]
Now I'd like to get all the combinations (size 3) of that list, so I use itertools.
combinations = itertools.combinations(A,3)
But my problem is that this will return the combinations in lexicographical order:
(1,2,3)
(1,2,4)
(1,2,100)
(1,3,4)
and so on.
I'd like to get the combinations sorted by the sum of its elements. That would be:
(1,2,3) which sums 6, (1,2,4) which sums 7, (1,3,4) which sums 8,
and so on so forth.
How can I achieve this?
The number of combinations for 1,000,000 things taken 3 at a time is 166,666,166,667,000,000. That is too big to fit in memory, too big to sort, and too huge to even loop over in a reasonable amount of time.
For a way to generate these combinations lazily, see "GENERATING ALL COMBINATIONS" in Donald Knuth's fascicle on Combinatorial Algorithms.
As long as the number of combinations is reasonable, the most straight forward approach is to directly sort the combinations by their sum:
>>> import itertools
>>> import pprint
>>> A = [1, 2, 3, 4, 100]
>>> combinations = sorted(itertools.combinations(A, 3), key=sum)
>>> pprint.pprint(combinations)
[(1, 2, 3),
(1, 2, 4),
(1, 3, 4),
(2, 3, 4),
(1, 2, 100),
(1, 3, 100),
(1, 4, 100),
(2, 3, 100),
(2, 4, 100),
(3, 4, 100)]
The technique uses sum() as a key-function for sorted().
When nCr is bigger than can be practically enumerated, it would make sense to reduce the problem by removing the larger elements from list A until the sum gets big enough to include those values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With