Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort combinations by sum of its elements in Python

I have a huge list of integers in Python (1000000+ elements), but I will illustrate what I need with an example for the sake of simplicity. Let's suppose I have this list:

A = [1,2,3,4,100]

Now I'd like to get all the combinations (size 3) of that list, so I use itertools.

combinations = itertools.combinations(A,3)

But my problem is that this will return the combinations in lexicographical order:

(1,2,3)
(1,2,4)
(1,2,100)
(1,3,4)

and so on.

I'd like to get the combinations sorted by the sum of its elements. That would be:

(1,2,3) which sums 6, (1,2,4) which sums 7, (1,3,4) which sums 8,

and so on so forth.

How can I achieve this?

like image 275
Dalvtor Avatar asked Apr 27 '17 15:04

Dalvtor


1 Answers

Ordered combinations too big to fit in memory

The number of combinations for 1,000,000 things taken 3 at a time is 166,666,166,667,000,000. That is too big to fit in memory, too big to sort, and too huge to even loop over in a reasonable amount of time.

For a way to generate these combinations lazily, see "GENERATING ALL COMBINATIONS" in Donald Knuth's fascicle on Combinatorial Algorithms.

Ordered combinations that can fit in memory

As long as the number of combinations is reasonable, the most straight forward approach is to directly sort the combinations by their sum:

>>> import itertools
>>> import pprint

>>> A = [1, 2, 3, 4, 100]
>>> combinations = sorted(itertools.combinations(A, 3), key=sum)
>>> pprint.pprint(combinations)
[(1, 2, 3),
 (1, 2, 4),
 (1, 3, 4),
 (2, 3, 4),
 (1, 2, 100),
 (1, 3, 100),
 (1, 4, 100),
 (2, 3, 100),
 (2, 4, 100),
 (3, 4, 100)]

The technique uses sum() as a key-function for sorted().

Bridging the two worlds

When nCr is bigger than can be practically enumerated, it would make sense to reduce the problem by removing the larger elements from list A until the sum gets big enough to include those values.

like image 97
Raymond Hettinger Avatar answered Oct 17 '22 12:10

Raymond Hettinger