Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of coin splitting algorithm

My question is about a CodeFu practice problem (2012 round 2 problem 3). It basically comes down to splitting an array of integers in two (almost) equal halves and returning the smallest possible difference between the two. I have included the problem description below. As noted in the comments this can be described as a balanced partition problem, which is a problem in the realm of dynamic programming.

Now similar problems have been discussed a lot, but I was unable find an efficient solution for this particular one. The problem is of course that the number of possible combinations to traverse soon grows too large for a brute force search (at least when using recursion). I have a recursive solution that works fine for all but the largest problem sets. I tried to add some optimizations that stop the recursion early, but the performance is still too slow to solve some arrays of the maximum length (30) within the 5 second maximum allowed by CodeFu. Any suggestions for how to improve or rewrite the code would be very welcome. I would also love to know if it might help to make the iterative version.

Update: on this fine site there is a theoretical discussion of the balanced partition problem, which gives a good idea of how to go about and solve this in a dynamic way. That is really what I am after, but I do not know how to put the theory into practice exactly. The movie mentions that the elements in the two subcollections can be found "using the old trick of back pointers", but I don't see how.

Problem

You and your friend have a number of coins with various amounts. You need to split the coins in two groups so that the difference between those groups in minimal.

E.g. Coins of sizes 1,1,1,3,5,10,18 can be split as: 1,1,1,3,5 and 10,18 1,1,1,3,5,10 and 18 or 1,1,3,5,10 and 1,18 The third combination is favorable as in that case the difference between the groups is only 1. Constraints: coins will have between 2 and 30 elements inclusive each element of coins will be between 1 and 100000 inclusive

Return value: Minimal difference possible when coins are split into two groups

NOTE: the CodeFu rules state that the execution time on CodeFu's server may be no more than 5 seconds.

Main Code

Arrays.sort(coins);

lower = Arrays.copyOfRange(coins, 0,coins.length-1);
//(after sorting) put the largest element in upper
upper = Arrays.copyOfRange(coins, coins.length-1,coins.length);            

smallestDifference = Math.abs(arraySum(upper) - arraySum(lower));
return findSmallestDifference(lower, upper, arraySum(lower), arraySum(upper), smallestDifference);

Recursion Code

private int findSmallestDifference (int[] lower, int[] upper, int lowerSum, int upperSum, int smallestDifference) {
    int[] newUpper = null, newLower = null;
    int currentDifference = Math.abs(upperSum-lowerSum);
    if (currentDifference < smallestDifference) {
        smallestDifference = currentDifference;
    } 
    if (lowerSum < upperSum || lower.length < upper.length || lower[0] > currentDifference 
            || lower[lower.length-1] > currentDifference 
            || lower[lower.length-1] < upper[0]/lower.length) {
        return smallestDifference;
    }
    for (int i = lower.length-1; i >= 0 && smallestDifference > 0; i--) {           
       newUpper = addElement(upper, lower[i]);
       newLower = removeElementAt(lower, i);
       smallestDifference = findSmallestDifference(newLower, newUpper, 
               lowerSum - lower[i], upperSum + lower [i], smallestDifference);
    }
    return smallestDifference;
}

Data Set

Here is an example of a set that takes too long to solve.

{100000,60000,60000,60000,60000,60000,60000,60000,60000, 60000,60000,60000,60000,60000,60000,60000,60000,60000, 60000,60000,60000,60000,60000,60000,60000,60000,60000, 60000,60000,60000}

If you would like the entire source code, I have put it on Ideone.

like image 520
titusn Avatar asked Oct 31 '12 11:10

titusn


2 Answers

EDIT just to be clear: I've written this answer before the additional limitation of running in under five seconds was specified in the question. I've also written it just to show that sometimes brute force is possible even when it seems that it's not. So this answer is not meant to be the "best" answer to this problem: it's precisely meant to be a brute force solution. As a side benefit this little solution may help someone writing another solution to verify in an acceptable time that their answer for "large" arrays are correct.

The problem is of course that the number of possible combinations to traverse soon grows too large for a brute force search.

Given the problem as initially stated (before the max running time of 5 seconds was specified), I totally dispute that statement ;)

You specifically wrote that the maximum length was 30.

Note that I'm not talking about other solutions (like, say, a dynamic programming solution that may or may not work given your constraints).

What I'm saying is that 230 is not big.It's a bit more than one billion and that's it.

A modern CPU can execute, on one core, billions of cycles per second.

You don't need to recurse to solve this: recursing shall blow your stack. There's an easy way to determine all the possible left / right combination: simply count from 0 to 2 exp 30 - 1 and check every bit (decide that, say, a bit on means you put the value to the left and off means you put the value to the right).

So given the problem statement if I'm not mistaken the following approach, without any optimization, should work:

  public static void bruteForce( final int[] vals) {
    final int n = vals.length;
    final int pow = (int) Math.pow(2, n);
    int min = Integer.MAX_VALUE;
    int val = 0;
    for (int i = pow -1; i >= 0; i--) {
        int diff = 0;
        for ( int j = 0; j < n; j++ ) {
            diff += (i & (1<<j)) == 0 ? vals[j] : -vals[j];

        }
        if ( Math.abs(diff) < min ) {
            min = Math.abs(diff);
            val = i;
        }
    }

    // Some eye-candy now...
    for ( int i = 0 ; i < 2 ; i ++ ) {
        System.out.print( i == 0 ? "Left:" : "Right:");
        for (int j = 0; j < n; j++) {
            System.out.print(((val & (1 << j)) == (i == 0 ? 0 : (1<<j)) ? " " + vals[j] : ""));
        }
        System.out.println();
    }
}

For example:

bruteForce( new int[] {2,14,19,25,79,86,88,100});
Left: 2 14 25 79 86
Right: 19 88 100


bruteForce( new int[] {20,19,10,9,8,5,4,3});
Left: 20 19
Right: 10 9 8 5 4 3

On an array of 30 elements, on my cheap CPU it runs in 125 s. That's for a "first draft", totally unoptimized solution running on a single core (the problem as stated is trivial to parallelize).

You can of course get fancier and reuse lots and lots and lots of intermediate results, hence solving an array of 30 elements in less than 125 s.

like image 136
TacticalCoder Avatar answered Nov 11 '22 03:11

TacticalCoder


Say N is the sum of all coins. We need to find a subset of coins, where the sum of its coins is closest to N/2. Let's calculate all possible sums and choose the best one. In worst case we may expect 2^30 possible sums, but this may not happen, because the largest possible sum is 100K*30, that is 3M - much less than 2^30 which would be about 1G. So an array of 3M ints or 3M bits should be sufficient to hold all possible sums.

So we have array a and a[m] == 1 if and only if m is a possible sum.

We start from zeroed array and have a[0]=1, because the sum 0 is possible (one has no coins).

for (every coin)
  for (int j=0; j<=3000000; j++)
    if (a[j] != 0)
      // j is a possible sum so far
      new_possible_sum = j + this_coin
      a[new_possible_sum] = 1

When you finish in 30 * 3M steps you will know all possible sums. Find the number m that is closest to N/2. Your result is abs(N-m - m). I hope I fit in time and memory bounds.

Edit: A correction is needed and 2 optimizations:

  1. Walk the array in descending order. Otherwise a dollar coin would overwrite the whole array in one go.
  2. Limit the size of the array to N+1 (including 0), to solve smaller coin sets faster.
  3. Since we almost always get 2 identical results: m and N-m, reduce the array size to N/2. Add bound check for new_possible_sum. Throw away greater possible sums.
like image 38
Jarekczek Avatar answered Nov 11 '22 05:11

Jarekczek