Choosing k out of N numbers from 2 arrays with highest total value

Question

Let A, B and C be three arrays, each containing N numbers:

 A = a[0], a[1], a[2], ..., a[N-1] 
 B = b[0], b[1], b[2], ..., b[N-1]
 C = c[0], c[1], c[3], ..., c[N-1]

I want to select the best k < N elements from A and the best k < N elements from B so that the total sum of their values is maximized. The interesting twist is: If element i is chosen from both A and B (where i in {0, ..., N-1} is the index), then instead of these elements contributing a[i] + b[i], they will contribute c[i] where c[i] >= a[i] + b[i].

At first glance this looked deceptively straightforward to me, but the more I think about the more involved it gets.

I am ultimately looking for an implementation in Python, but at this stage I am just trying to get a sense of what would be an efficient algorithm here.

Example

To clarify, inputs to the algorithm are the 3 N x 1 arrays A, B and C and an integer value for k. The expected output are two k x 1 lists of indices, defining the value-maximizing combination of elements from A and B (and C).

For example, suppose k = 2, N = 4 and let

 A = a[0], a[1], a[2], a[3] = 3, 1, 1, 0   
 B = b[0], b[1], b[2], b[3] = 1, 3, 0, 1  
 C = c[0], c[1], c[2], c[3] = 4, 4, 3, 2

Even in this simple example, there are many possible combinations. For instance, if elements i = 0, 2 are chosen from A and elements j = 1, 3 are chosen from B, then the total value would be a[0] + a[2] + b[1] + b[3] = 8.

If on the other hand elements i = 0, 1 and j = 0, 1 would be chosen from both A and B, then the special twist applies: Instead of yielding a[0] + a[1] + b[0] + b[1], the total value is given by c[0] + c[1] = 8.

In this example, the combination of elements that maximizes the total value is given by i = 0, 2 from A and elements j = 1, 2 from B. This yields a total value of a[0] + b[1] + c[2] = 9, which can be verified is more than any other combination.

Comparison of answers

Here's a quick comparison of the 3 submitted solutions. First, I checked all of them, and they all give the intended results. As a side comment, none of them requires the elements of C to be weakly larger than the sum of the corresponding elements in A and B, so I dropped this assumption in my performance review.

Here's what I run:

import numpy as np
from utils import tic, toc  # simple wrapper to time.perf_counter()

k, N = 10, 1000

A = list(np.random.random_sample([N]))
B = list(np.random.random_sample([N]))
C = list(np.random.random_sample([N]))

tic()
print(optimal_choices(k, A, B, C))  # solution by btilly
toc()

tic()
print(maxPicks(A.copy(), B.copy(), C.copy(), k))  # solution by Eric T-M
toc()

tic()
print(maxSum(A, B, C, k))  # solution by Alain T.
toc()

I tested for various combinations of k and N. It seems that @btilly's algorithm scales well in N as long as k is small. @Alain-T.'s algorithm does the opposite, doing well when k is large relative to N. Across the board, @Eric-T-M's algorithm does best, scaling well in both k and N.

Small problem: k = 10 and N = 500

btilly's algorithm: 0.49s
Eric T-M's algorithm: 0.00s
Alain T.'s algorithm: 0.52s

Small-k, large-N: k = 10 and N = 1000

btilly's algorithm: 0.89s
Eric T-M's algorithm: 0.00s
Alain T.'s algorithm: 1.99s

Large-k, small-N: k = 80 and N = 100

btilly's algorithm: 1.54s
Eric T-M's algorithm: 0.00s
Alain T.'s algorithm: 0.09s

Medium problem: k = 50 and N = 1000

btilly's algorithm: 13.01ss
Eric T-M's algorithm: 0.00s
Alain T.'s algorithm: 8.55s

Large problem 1: k = 10 and N = 1_000_000

Eric T-M's algorithm: 1.03s

Large problem 2: k = 1_000 and N = 100_000

Eric T-M's algorithm: 10.22s

(For the benchmarks, I removed the sorting in Alain T.'s code, to make it comparable.)

Eric T-M · Accepted Answer

Try this. It takes O(N^2) time and it is fairly simple.

def maxPicks(A,B,C,k):
    # returns the tuple (list of entries picked in A, list of entries picked in B, total value)

    # BASE CASE
    if k == 0:
        return ([], [], 0)
    aMax = max(A)
    bMax = max(B)
    cMax = max(C)

    if (aMax + bMax) > cMax:
        aIdx = A.index(aMax)
        bIdx = B.index(bMax)
        B[aIdx] = C[aIdx] - A[aIdx]
        A[aIdx] = -2
        C[aIdx] = -1
        A[bIdx] = C[bIdx] - B[bIdx]
        B[bIdx] = -2
        C[bIdx] = -1
        nextPicks = maxPicks(A,B,C,k-1)
        return (nextPicks[0] + [aIdx], nextPicks[1] + [bIdx], nextPicks[2] + aMax + bMax)
    else:
        cIdx = C.index(cMax)
        A[cIdx] = -1
        B[cIdx] = -1
        C[cIdx] = -1
        nextPicks = maxPicks(A,B,C,k-1)
        return (nextPicks[0] + [cIdx], nextPicks[1] + [cIdx], nextPicks[2] + cMax)

Here's how it works:

The base case should be self explanatory. Otherwise we will compare the sum of the maximum of all entries in A and the maximum of all entries in B to the maximum of all entries in C. If this sum is larger than it is safe to pick these entries from A and B, but before making more picks we will need to set the entries we picked as well as their corresponding entries in C to a negative value. As a side note I do assume that all values in A, B and C are originally nonnegative so by setting them negative we forbid our algorithm from picking them again. If this assumption is wrong you might want to set these values to something extremely negative to prohibit double picks. We also see that if we picked A[i] the value of B[i] is now whatever C[i]-A[i] was, because picking B[i] will lose us the value in A[i] and give us the value in C[i] same for the entry A[j] if we pick B[j].

If on the other hand, the greatest entry in C was greater than or equal to aMax+bMax we want to pick it (by picking the corresponding entries in both A and B, because no other picks of entries in A and B or just C alone would be more valuable. At this point we know we don't want to re-pick A[i],B[i], or C[i] again, so we set them all negative.

Choosing k out of N numbers from 2 arrays with highest total value

Tags:

python

algorithm

optimization

combinations

Example

Comparison of answers

Bob

1 Answers

Eric T-M

Recent Activity

Donate For Us

Choosing k out of N numbers from 2 arrays with highest total value

Tags:

python

algorithm

optimization

combinations

Example

Comparison of answers

Bob

1 Answers

Eric T-M

Related questions

Recent Activity

Donate For Us