Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given two sets of numbers, find the smallest set from each where the sum is equal

Tags:

c#

algorithm

set

I am working on an application that needs to match two sets of data based on various criteria, including the sum of any number of items from each set. I've distilled the problem down to this statement:

Given a set of items and transactions, find the smallest set of items where the sum is equal to the sum of the smallest set of transactions. (There’s some complexity I’m ignoring for this post, but for now I’m only concerned about the total amounts matching, not dates, descriptions, clearing differences, etc.)

Or, mathematically:Given two sets of numbers, find the smallest set from each where the sums are equal.

The other similar SO questions I've run across assume you know the sum ahead of time, or know the quantity from each set that you are going for.

And here is a test that (I think) illustrates what I'm going for.

    [TestMethod]
    public void StackOverflowTest()
    {
        var seta = new[]{10, 20, 30, 40, 50};
        var setb = new[]{ 45, 45, 100, 200 };

        var result = Magic(seta, setb);


        Assert.AreEqual(new[]{40,50},result.SetA);
        Assert.AreEqual(new[] { 45, 45 }, result.SetB);
    }
    class MagicResult
    {
        public int[] SetA { get; set; }
        public int[] SetB { get; set; }

    }
    private MagicResult Magic(int[] seta, int[] setb)
    {
        throw new NotImplementedException();
    }

I'm looking for an elegant solution that will make this pass, but will take any pseudocode or suggestion that gets me there ;)

like image 512
Daniel Avatar asked Oct 26 '12 23:10

Daniel


3 Answers

Brute force:

 var result = (from a in seta.Subsets()
               from b in setb.Subsets()
               where a.Count() > 0 && b.Count() > 0
               where a.Sum() == b.Sum()
               orderby a.Count() + b.Count()
               select new MagicResult { SetA = a.ToArray(), SetB = b.ToArray() }
              ).First();

using the Subsets method from the EvenMoreLINQ project.

like image 175
dtb Avatar answered Sep 18 '22 13:09

dtb


This can be solved using dynamic programming in O(nW) time where W is the size of the largest sum. Solve the knapsack problem for both sets to generate an array for each that contains all the possible sums and keep track of the number of items used. Then, compare equal sums in each array to find the minimum for each

Not tested, but this is the idea.

arr1dp = [None]*W;  arr1dp[0] = 0;
arr2dp = [None]*W;  arr2dp[0] = 0;


# knapsack arr1
for i in range(len(arr1)):
    for cur_item in arr1:
        if (arr1dp[cur_item] is not none):
             arr1dp[cur_item+i] = min(arr1dp[cur_item]+1,arr1dp[cur_item])

# do the same for arr2
# omitted for brevity

# find the smallest match
for i in range(W):
    if arr1dp[i] is not none and arr2dp[i] is not none:
         min_val = min(min_val,arr1dp[i]+arr2dp[i])
like image 23
dfb Avatar answered Sep 21 '22 13:09

dfb


If the two sets contain a number in common, there is a solution of size 1.

If not, try all sums of two numbers (there are N-choose-two, or N*(N-1)/2 in each set). Compare them against the collection of single-number and two-number sums.

If no joy, try all sums of three numbers, comparing them against 1, 2 or 3-number sums; and so on until all sums (2**N for a set of size N) have been tried.

Here's working code that stops searching as soon as it finds a solution. (There might be smaller sums with the same number of summands). It's in python, but that's practically pseudo-code :-)

from itertools import combinations

# To allow lists of different sizes: ensure list1 is never the short one
if len(list1) < len(list2):
    list1, list2 = list2, list1

def found(val, dict1, dict2):
    print "Sum:", val
    print "Sum 1", dict1[val]
    print "Sum 2", dict2[val]

def findsum(list1, list2):
    # Each dict has sums as keys and lists of summands as values.
    # We start with length 1:
    dict1 = dict()
    dict2 = dict()

    for n in range(1, max(len(list1), len(list2))+1):
        # Check all size n sums from list1 against size < n sums in list2
        for nums in combinations(list1, n):
            s = sum(nums)
            if s in dict1:  # Is this sum new for our list?
                continue

            dict1[s] = nums
            if s in dict2:   
                found(s, dict1, dict2)
                return   # If you want to look for a smallest sum, keep going

        # If list2 is too short, nothing to do
        if len(list2) < n:
            continue

        # Check all size n sums from list2 against size <= n sums in list1
        for nums in combinations(list2, n):
            s = sum(nums)
            if s in dict2:  # Is this sum new for our list?
                continue

            dict2[s] = nums
            if s in dict1:
                found(s, dict1, dict2)
                return   # If you want to look for a smallest sum, keep going

findsum(list1, list2)

This is designed to find a solution in the smallest number of comparisons. If you also want the sum to be minimal, then at each size n generate all n-part sums at once, sort them and check them in increasing order.

like image 20
alexis Avatar answered Sep 19 '22 13:09

alexis