Let A, B and C be three arrays, each containing N numbers:
A = a[0], a[1], a[2], ..., a[N-1]
B = b[0], b[1], b[2], ..., b[N-1]
C = c[0], c[1], c[3], ..., c[N-1]
I want to select the best k < N elements from A and the best k < N elements from B so that the total sum of their values is maximized. The interesting twist is: If element i is chosen from both A and B (where i in {0, ..., N-1} is the index), then instead of these elements contributing a[i] + b[i], they will contribute c[i] where c[i] >= a[i] + b[i].
At first glance this looked deceptively straightforward to me, but the more I think about the more involved it gets.
I am ultimately looking for an implementation in Python, but at this stage I am just trying to get a sense of what would be an efficient algorithm here.
To clarify, inputs to the algorithm are the 3 N x 1 arrays A, B and C and an integer value for k. The expected output are two k x 1 lists of indices, defining the value-maximizing combination of elements from A and B (and C).
For example, suppose k = 2, N = 4 and let
A = a[0], a[1], a[2], a[3] = 3, 1, 1, 0
B = b[0], b[1], b[2], b[3] = 1, 3, 0, 1
C = c[0], c[1], c[2], c[3] = 4, 4, 3, 2
Even in this simple example, there are many possible combinations. For instance, if elements i = 0, 2 are chosen from A and elements j = 1, 3 are chosen from B, then the total value would be a[0] + a[2] + b[1] + b[3] = 8.
If on the other hand elements i = 0, 1 and j = 0, 1 would be chosen from both A and B, then the special twist applies: Instead of yielding a[0] + a[1] + b[0] + b[1], the total value is given by c[0] + c[1] = 8.
In this example, the combination of elements that maximizes the total value is given by i = 0, 2 from A and elements j = 1, 2 from B. This yields a total value of a[0] + b[1] + c[2] = 9, which can be verified is more than any other combination.
Here's a quick comparison of the 3 submitted solutions. First, I checked all of them, and they all give the intended results. As a side comment, none of them requires the elements of C to be weakly larger than the sum of the corresponding elements in A and B, so I dropped this assumption in my performance review.
Here's what I run:
import numpy as np
from utils import tic, toc # simple wrapper to time.perf_counter()
k, N = 10, 1000
A = list(np.random.random_sample([N]))
B = list(np.random.random_sample([N]))
C = list(np.random.random_sample([N]))
tic()
print(optimal_choices(k, A, B, C)) # solution by btilly
toc()
tic()
print(maxPicks(A.copy(), B.copy(), C.copy(), k)) # solution by Eric T-M
toc()
tic()
print(maxSum(A, B, C, k)) # solution by Alain T.
toc()
I tested for various combinations of k and N. It seems that @btilly's algorithm scales well in N as long as k is small. @Alain-T.'s algorithm does the opposite, doing well when k is large relative to N. Across the board, @Eric-T-M's algorithm does best, scaling well in both k and N.
Small problem: k = 10 and N = 500
Small-k, large-N: k = 10 and N = 1000
Large-k, small-N: k = 80 and N = 100
Medium problem: k = 50 and N = 1000
Large problem 1: k = 10 and N = 1_000_000
Large problem 2: k = 1_000 and N = 100_000
(For the benchmarks, I removed the sorting in Alain T.'s code, to make it comparable.)
Try this. It takes O(N^2) time and it is fairly simple.
def maxPicks(A,B,C,k):
# returns the tuple (list of entries picked in A, list of entries picked in B, total value)
# BASE CASE
if k == 0:
return ([], [], 0)
aMax = max(A)
bMax = max(B)
cMax = max(C)
if (aMax + bMax) > cMax:
aIdx = A.index(aMax)
bIdx = B.index(bMax)
B[aIdx] = C[aIdx] - A[aIdx]
A[aIdx] = -2
C[aIdx] = -1
A[bIdx] = C[bIdx] - B[bIdx]
B[bIdx] = -2
C[bIdx] = -1
nextPicks = maxPicks(A,B,C,k-1)
return (nextPicks[0] + [aIdx], nextPicks[1] + [bIdx], nextPicks[2] + aMax + bMax)
else:
cIdx = C.index(cMax)
A[cIdx] = -1
B[cIdx] = -1
C[cIdx] = -1
nextPicks = maxPicks(A,B,C,k-1)
return (nextPicks[0] + [cIdx], nextPicks[1] + [cIdx], nextPicks[2] + cMax)
Here's how it works:
The base case should be self explanatory. Otherwise we will compare the sum of the maximum of all entries in A and the maximum of all entries in B to the maximum of all entries in C. If this sum is larger than it is safe to pick these entries from A and B, but before making more picks we will need to set the entries we picked as well as their corresponding entries in C to a negative value. As a side note I do assume that all values in A, B and C are originally nonnegative so by setting them negative we forbid our algorithm from picking them again. If this assumption is wrong you might want to set these values to something extremely negative to prohibit double picks. We also see that if we picked A[i] the value of B[i] is now whatever C[i]-A[i] was, because picking B[i] will lose us the value in A[i] and give us the value in C[i] same for the entry A[j] if we pick B[j].
If on the other hand, the greatest entry in C was greater than or equal to aMax+bMax we want to pick it (by picking the corresponding entries in both A and B, because no other picks of entries in A and B or just C alone would be more valuable. At this point we know we don't want to re-pick A[i],B[i], or C[i] again, so we set them all negative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With