Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split an array into two subsets and keep sum of sub-values of array as equal as possible

I really need an master of algorithm here! So the thing is I got for example an array like this:

[
    [870, 23]
    [970, 78]
    [110, 50]
]

and I want to split it up, so that it looks like this:

// first array
[
    [970, 78]
]
// second array
[
    [870, 23]
    [110, 50]
]

so now, why do I want it too look like this?

Because I want to keep the sum of sub values as equal as possible. So 970 is about 870 + 110 and 78 is about 23 + 50. So in this case it's very easy because if you would just split them and only look at the first sub-value it will already be correct but I want to check both and keep them as equal as possible, so that it'll also work with an array which got 100 sub-arrays! So if anyone can tell me the algorithm with which I can program this it would be really great!

Scales:

  • ~1000 elements (sublists) in the array
  • Elements are integers up to 10^9

I am looking for a "close enough solution" - it does not have to be the exact optimal solution.

like image 653
noob Avatar asked Oct 28 '12 18:10

noob


People also ask

Can an array be divided into two subsequences with equal sums?

If the sum is odd, the array cannot be partitioned into two subarrays having equal sums. If the sum is even, divide the array into subsets, such that both have sums equal to sum/2.


3 Answers

First, as already established - the problem is NP-Hard, with a reduction form Partition Problem.

Reduction:
Given an instance of partition problem, create lists of size 1 each. The result will be this problem exactly.

Conclusion from the above:
This problem is NP-Hard, and there is no known polynomial solution.

Second, Any exponential and pseudo polynomial solutions will take just too long to work, due to the scale of the problem.

Third, It leaves us with heuristics and approximation algorithms.

I suggest the following approach:

  1. Normalize the scales of the sublists, so all the elements will be in the same scale (say, all will be normalzied to range [-1,1] or all will be normalized to standard normal distribution).
  2. Create a new list, in which, each element will be the sum of the matching sublist in the normalized list.
  3. Use some approximation or heuristical solution that was developed for the subset-sum / partition problem.

The result will not be optimal, but optimal is really unattanable here.

like image 189
amit Avatar answered Oct 13 '22 04:10

amit


From what I gather from the discussion under the original post, you're not searching for a single splitting point, but rather you want to distribute all pairs among two sets, such that the sums in each of the two sets are approximately equal.

Since a close enough solution is acceptable, maybe you could try an approach based on simulated annealing? (see http://en.wikipedia.org/wiki/Simulated_annealing)

In short, the idea is that you start out by randomly assigning each pair to either the Left or the Right set. Next, you generate a new state by either

  • a) moving a randomly selected pair from the Left to the Right set,
  • b) moving a randomly selected pair from the Right to the Left set, or
  • c) doing both.

Next, determine if this new state is better or worse than the current state. If it is better, use it. If it is worse, take it only if it is accepted by the acceptance probability function, which is a function that initially allows worse states to be used, but favours them less and less as time moves on (or the "temperature decreases", in SA terms). After a large number of iterations (say 100.000), you should have a pretty good result.

Optionally, rerun this algorithm multiple times because it may get stuck in local optima (although the acceptance probability function attempts to counter this).

Advantages of this approach are that it's simple to implement, and you can decide for yourself how long you want it to continue searching for a better solution.

like image 33
Leon Bouquiet Avatar answered Oct 13 '22 05:10

Leon Bouquiet


I'm assuming that we're just looking for a place in the middle of the array to split it into its first and second part.

It seems like a linear algorithm could do this. Something like this in JavaScript.

arrayLength = 2;
tolerance = 10;

// Initialize the two sums.
firstSum = [];
secondSum = [];
for (j = 0; j < arrayLength; j++)
{
   firstSum[j] = 0;
   secondSum[j] = 0;
   for (i = 0; i < arrays.length; i++)
   {
      secondSum += arrays[i][j];
   }
}

// Try splitting at every place in "arrays".
// Try to get the sums as close as possible.
for (i = 0; i < arrays.length; i++)
{
   goodEnough = true;
   for (j = 0; j < arrayLength; j++)
   {
      if (Math.abs(firstSum[j] - secondSum[j]) > tolerance)
         goodEnough = false;
   }

   if (goodEnough)
   {
      alert("split before index " + i);
      break;
   }

   // Update the sums for the new position.
   for (j = 0; j < arrayLength; j++)
   {
      firstSum[j] += arrays[i][j];
      secondSum[j] -= arrays[i][j];
   }
}
like image 30
Hew Wolff Avatar answered Oct 13 '22 06:10

Hew Wolff