I'm looking for an algorithm which can take two sets of integers (both positive and negative) and find subsets within each that have the same sum.
The problem is similar to the subset sum problem except that I'm looking for subsets on both sides.
Here's an example:
List A {4, 5, 9, 10, 1}
List B {21, 7, -4, 180}
So the only match here is: {10, 1, 4, 9} <=> {21, 7, -4}
Does anyone know if there are existing algorithms for this kinda problems?
So far, the only solution I have is a brute force approach which tries every combination but it performs in Exponential time and I've had to put a hard limit on the number of elements to consider to avoid it from taking too long.
The only other solution I can think of is to run a factorial on both lists and look for equalities there but that is still not very efficient and takes exponentially longer as the lists get bigger.
What others have said is true:
This problem is NP-complete. An easy reduction is to usual subset-sum. You can show this by noting that a subset of A sums to a subset of B (not both empty) only if a non-empty subset of A union (-B) sums to zero.
This problem is only weakly NP-complete, in that it's polynomial in the size of the numbers involved, but is conjectured to be exponential in their logarithms. This means that the problem is easier than the moniker "NP-complete" might suggest.
You should use dynamic programming.
So what am I contributing to this discussion? Well, code (in Perl):
@a = qw(4 5 9 10 1);
@b = qw(21 7 -4 180);
%a = sums( @a );
%b = sums( @b );
for $m ( keys %a ) {
next unless exists $b{$m};
next if $m == 0 and (@{$a{0}} == 0 or @{$b{0}} == 0);
print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n";
}
sub sums {
my( @a ) = @_;
my( $a, %a, %b );
%a = ( 0 => [] );
while( @a ) {
%b = %a;
$a = shift @a;
for my $m ( keys %a ) {
$b{$m+$a} = [@{$a{$m}},$a];
}
%a = %b;
}
return %a;
}
It prints
sum(4 5 9 10) = sum(21 7)
sum(4 9 10 1) = sum(21 7 -4)
so, notably, there is more than one solution that works in your original problem!
Edit: User itzy correctly pointed out that this solution was wrong, and worse, in multiple ways!! I'm very sorry about that and I've hopefully addressed these concerns in the new code above. Nonetheless, there is still one problem, namely that for any particular subset-sum, it only prints one of the possible solutions. Unlike the previous problems, which were straight-up errors, I would classify this as an intentional limitation. Best of luck and beware of bugs!
Like the subset sum problem, this problem is weakly NP-complete, so it has a solution that runs in time polynomial(M), where M is the sum of all numbers appearing in the problem instance. You can achieve that with dynamic programming. For each set you can generate all possible sums by filling a 2-dimensional binary table, where "true" at (k,m) means that a subset sum m can be achieved by picking some elements from the first k elements of the set.
You fill it iteratively - you set (k,m) to "true" if (k-1,m) is set to "true" (obviously, if you can get m from k-1 elements, you can get it from k elements by not picking the k-th) or if (k-1,m-d) is set to "true" where d is the value of k-th element in the set (the case where you pick the k-th element).
Filling the table gets you all the possible sums in the last column (the one representing the whole set). Do this for both sets and find common sums. You can backtrack the actual subsets representing the solutions by reversing the process which you used to fill the tables.
Thanks a lot for all the quick responses!
The dynamic programming solution is not really different to the exhaustive approch we have right now and I guess if we need the optimal solution we do need to consider every possible combination but the time it takes to generate this exhaustive list of sums are too long.. Did a quick test and the time it takes to generate all possible sums for x number of elements very quickly go over 1 min:
11 elements took - 0.015625 seconds
12 elements took - 0.015625 seconds
13 elements took - 0.046875 seconds
14 elements took - 0.109375 seconds
15 elements took - 0.171875 seconds
16 elements took - 0.359375 seconds
17 elements took - 0.765625 seconds
18 elements took - 1.609375 seconds
19 elements took - 3.40625 seconds
20 elements took - 7.15625 seconds
21 elements took - 14.96875 seconds
22 elements took - 31.40625 seconds
23 elements took - 65.875 seconds
24 elements took - 135.953125 seconds
25 elements took - 282.015625 seconds
26 elements took - 586.140625 seconds
27 elements took - 1250.421875 seconds
28 elements took - 2552.53125 seconds
29 elements took - 5264.34375 seconds
which for the business problem we're trying to solve is not really acceptable.. I'm gonna go back to the drawing board and see whether we do indeed need to know all the solutions or can we just do with one (the smallest/largest subset, e.g.) instead and hopefully that can help simply the problem and make my algorithm perform to expectaion.
Thanks all the same!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With