Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple subset sum calculation

I have 2 sets, set A contains set of random numbers and set B's elements are sum of set A's subsets.

For example,

A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96]

B = [183, 36, 231, 128, 137]

I want to find which number is sum of which subset with data like this.

S = [[45, 46, 92], [36], [8, 15, 39, 73, 96], [60, 68], [9, 15, 33, 80]]

I was able to write really dumb brute force code with python.

class SolvedException(BaseException):
    pass

def solve(sums, nums, answer):
    num = nums[-1]

    for i in range(0, len(sums)):
        sumi = sums[i]
        if sumi == 0:
            continue
        elif sumi - num < 0:
            continue
        answer[i].append(num)

        sums[i] = sumi - num

        if len(nums) != 1:
            solve(sums, nums[:-1], answer)
        elif sumi - num == 0:
            raise SolvedException(answer)

        sums[i] = sumi

        answer[i].pop()

try:
    solve(B, A, [list() for i in range(0, len(B))])
except SolvedException as e:
    print e.args[0]

This code works pretty well for small datas, but it will take billion years to calculate my data(which has 71 numbers and 10 sums).

I could use some better algoritms or optimization.

Sorry for my bad English and terrible unefficient code.


Edit : Sorry, I realized that I didn't described the problem accurately.

As every single element in A are used to make elemens of B, sum(A) == sum(B)

Also, set S must be partition of set A.

like image 206
Akintos Avatar asked Feb 23 '17 17:02

Akintos


1 Answers

This is known as the subset-sum problem and it is a well known NP-complete problem. So basically there is no efficient solution. See for example https://en.wikipedia.org/wiki/Subset_sum_problem

However If your number N is not too large, there is a pseudo polynomial algorithms, using dynamic programming: You read the list A from left to right and keep the list of the sum which are doable and smaller than N. If you know the number which are doable for a given A, you can easily get those which are doable for A + [a]. Hence the dynamic programming. It will typically be fast enough for a problem of the size you gave there.

Here is a Python quick solution:

def subsetsum(A, N):
    res = {0 : []}
    for i in A:
        newres = dict(res)
        for v, l in res.items():
            if v+i < N:
                newres[v+i] = l+[i]
            elif v+i == N:
                return l+[i]
        res = newres
    return None

Then

>>> A = [8, 9, 15, 15, 33, 36, 39, 45, 46, 60, 68, 73, 80, 92, 96]
>>> subsetsum(A, 183)
[15, 15, 33, 36, 39, 45]

After OP edit:

Now I correctly understand you problem, I'll still think that your problem can be solved efficiently, provided you have an efficient subset-sum solver: I'd use divide and conquer solution on B:

  • cut B into two approximately equal pieces B1 and B2
  • use your subset-sum solver to search among A for all subsets S whose sum are equal to sum(B1).
  • for each such S:
    • call recursively solve(S, B1) and solve(A - S, B2)
    • if both succeed you have a solution

However, your (71, 10) problem below is out of reach for the dynamic programming solution I suggested.


By the way, here is a quick solution of your problem not using divide and conquer, but which contains the correct adaptation of my dynamic solver to get all solutions:

class NotFound(BaseException):
    pass

from collections import defaultdict
def subset_all_sums(A, N):
    res = defaultdict(set, {0 : {()}})
    for nn, i in enumerate(A):
        # perform a deep copy of res
        newres = defaultdict(set)
        for v, l in res.items():
            newres[v] |= set(l)
            for v, l in res.items():
                if v+i <= N:
                    for s in l:
                        newres[v+i].add(s+(i,))
                        res = newres
                        return res[N]

def list_difference(l1, l2):
    ## Similar to merge.
    res = []
    i1 = 0; i2 = 0
    while i1 < len(l1) and i2 < len(l2):
        if l1[i1] == l2[i2]:
            i1 += 1
            i2 += 1
        elif l1[i1] < l2[i2]:
            res.append(l1[i1])
            i1 += 1
        else:
            raise NotFound
            while i1 < len(l1):
                res.append(l1[i1])
                i1 += 1
                return res

def solve(A, B):
    assert sum(A) == sum(B)
    if not B:
        return [[]]
        res = []
        ss = subset_all_sums(A, B[0])
        for s in ss:
            rem = list_difference(A, s)
            for sol in solve(rem, B[1:]):
                res.append([s]+sol)
                return res

Then:

>>> solve(A, B)
[[(15, 33, 39, 96), (36,), (8, 15, 60, 68, 80), (9, 46, 73), (45, 92)],
 [(15, 33, 39, 96), (36,), (8, 9, 15, 46, 73, 80), (60, 68), (45, 92)],
 [(8, 15, 15, 33, 39, 73), (36,), (9, 46, 80, 96), (60, 68), (45, 92)],
 [(15, 15, 73, 80), (36,), (8, 9, 33, 39, 46, 96), (60, 68), (45, 92)],
 [(15, 15, 73, 80), (36,), (9, 39, 45, 46, 92), (60, 68), (8, 33, 96)],
 [(8, 33, 46, 96), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (45, 92)],
 [(8, 33, 46, 96), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (45, 92)],
 [(9, 15, 33, 46, 80), (36,), (8, 15, 39, 73, 96), (60, 68), (45, 92)],
 [(45, 46, 92), (36,), (8, 15, 39, 73, 96), (60, 68), (9, 15, 33, 80)],
 [(45, 46, 92), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (9, 60, 68)],
 [(45, 46, 92), (36,), (15, 15, 60, 68, 73), (9, 39, 80), (8, 33, 96)],
 [(45, 46, 92), (36,), (9, 15, 15, 39, 73, 80), (60, 68), (8, 33, 96)],
 [(9, 46, 60, 68), (36,), (8, 15, 39, 73, 96), (15, 33, 80), (45, 92)]]

>>> %timeit solve(A, B)
100 loops, best of 3: 10.5 ms per loop

So it is quite fast for this size of problem, though nothing is optimized here.

like image 181
hivert Avatar answered Sep 20 '22 01:09

hivert