Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the (100) highest sums out of elements from multiple lists

I have the following problem (in my version there are some constraints which probably make the problem easier to solve, but a general solution would also be nice):

I have 4 lists, with 10 entries. The first list contains integer entries between 0 and 6, the other three contain entries between 0 and 3. I now have to find the 100 best combinations of elements of these four lists. One combination consists of the sum of 4 values, one from each list. Note that I don't only need to know the values of the resulting elements, but also how they were composed, as there is more information associated with the values.

Example 1:

list A: 6, 3, 2, 2, 1, 0, 0, 0, 0
list B: 3, 2, 0, 0, 0, 0, 0, 0, 0
list C: 2, 2, 0, 0, 0, 0, 0, 0, 0
list D: 3, 1, 1, 1, 1, 1, 0, 0, 0

The five best combinations would in this case be:

  • 14 (A[0] + B[0] + C[0] + D[0])
  • 14 (A[0] + B[0] + C[1] + D[0])
  • 13 (A[0] + B[1] + C[0] + D[0])
  • 13 (A[0] + B[1] + C[1] + D[0])
  • 12 (A[0] + B[0] + C[0] + D[1])

Note that I have sorted the entries of the lists, as this would probably the first step of most algorithms which solve this problem.

Trivial solution

The easiest solution consists of forming all 10.000 possible combinations and then choose the hundred bests out of these. One does not even have to sort the 10.000 possible combinations. One can first scan through the array and analyze which value appears how often. Then one can pick the hundred best values (and their further attributes) in the next scan through the values.

A solution which does not work

Another idea which came to my mind is the following: First one has to sort the lists. In each list I want to find an index, which separates those entries which can contribute to a solution from them which can't. When one had to find the four best combinations in example 1, one could for example select the first two elements of lists B and C, and the first one of lists A and D This would yield:

A: 6
B: 3, 2
C: 3, 2
D: 3

All combinations of these sublists would yield the optimal solution. However this is not always possible, as can be seen in the following example:

Example 2:

(this time with only two lists)

list A: 6, 5, 5, 0, 0, ...
list B: 3, 1, 1, 0, 0, ...

Here, the best four solutions are

  • 6 + 3
  • 5 + 3
  • 5 + 3
  • 6 + 1

However this solution can not be found with indexes, which separate four combinations from all other combinations.

like image 655
Misch Avatar asked Nov 12 '12 12:11

Misch


2 Answers

Let me try rephrase the existing answers (by Evgeny Kluev, solendil) in a more concise way.

The solution is a Best-first search starting from configuration (0,0,0,0). The search tree has a branching factor of 4 (the number of lists).

At (0,0,0,0) you know the next best configuration is either one of the (1,0,0,0), (0,1,0,0), (0,0,1,0) or (0,0,0,1). So you put these 'leaves' of the search tree into a priority queue sorted by how good each configuration is. The queue of leaves is the queue of next-best-configuration candidate.

Then you take the best configuration out of the queue to add to the answer list and update the queue. For example, if (0,0,1,0) is the next best one, take it out of the queue then add its children - (1,0,1,0), (0,1,1,0), (0,0,2,0), (0,0,1,1) - to the queue.

like image 197
Apiwat Chantawibul Avatar answered Oct 28 '22 11:10

Apiwat Chantawibul


This solution uses two data structures:

  1. Priority queue, where sum of list elements is the key and a tuple of list indexes is the value.
  2. Hash set containing tuples of list indexes.

Algorithm:

  1. Sort every list.
  2. Put a tuple containing first element of every list into the queue.
  3. While hash set contains less than 100 tuples, do steps 4 and 5.
  4. Pop largest item from the queue, check if hash set contains correspondent index tuple (T). If such tuple is found, repeat step 4, otherwise continue to step 5.
  5. Insert tuple T into hash set. Create N tuples (where N is the number of lists), each having N-1 indexes equal to correspondent indexes in T, and one index greater by one, and insert these tuples into the queue.

Probably the best way to implement the Priority queue (for this problem) is to use an ordered container (binary search tree or skip list), sorted first by sum of list elements, then by the list indexes. This makes separate hash set not necessary, since this container could detect duplicates before they are added into the queue.

like image 41
Evgeny Kluev Avatar answered Oct 28 '22 10:10

Evgeny Kluev