Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find k-th minimum sum of every possible subset


Last time I found interesting problem, and I got stuck on it.

Given n numbers a[1], ..., a[n] in ascending order and number k (1 <= n,k <= 10^5).

Let say that we're sorting every possible subset of given sequence by sum.
We have to find sum of k-th such subset.

For example:
n = 4, k = 8
a = { 2, 7, 8, 15 }

1: { 2 }, sum = 2
2: { 7 }, sum = 7
3: { 8 }, sum = 8
4: { 2, 7 }, sum = 9
5: { 2, 8 }, sum = 10
6: { 7, 8 }, sum = 15
7: { 15 }, sum = 15
8: { 2, 15 }, sum = 17
...
k = 8, so answer = 17 (subset {2,15}).

Of course we can generate every possible subset and whole solution runs in O(2^n * n), but I'm searching for something more smart - linear, or at least O(nk).

like image 888
Jacob Scott Avatar asked Oct 19 '15 16:10

Jacob Scott


2 Answers

(Going to assume nonempty subsets for simplicity. Handling the empty subset is a line or two.)

Given a nonempty subset of indices S, define the children of S to be S \ {max(S)} U {max(S) + 1} and S U {max(S) + 1}. Starting with {1}, the child relation forms a tree that includes every nonempty subset of positive integers.

{1}
|  \
{2} {1,2}______
|  \     \     \
{3} {2,3} {1,3} {1,2,3}

Keyed by the sum of corresponding array elements, this tree is a min-heap. If you compute the keys carefully (by adding and subtracting rather than summing from scratch) and implement the min-heap deletion lazily, then you get an O(k log k)-time algorithm.

like image 153
David Eisenstat Avatar answered Oct 17 '22 01:10

David Eisenstat


In his answer, David essentially employed Dijkstra's algorithm on a carefully constructed directed graph of subsets.

A bit more naive approach that I initially came up with is to apply Dijkstra directly to the hypercube of subsets. In other words, start with an empty subset and repeatedly try adding each previously missing array element (transitions of the form S -> {S U {v} for v in [n] \ S}.

The complexity of this approach will be worse since the maximum degree is O(n), implying that we may explore as many as O(nk) vertices of the hypercube. It leads to an unpleasant overall complexity of O(nk log nk).

David's construction differs from the naive approach in having a maximum degree of 2, guaranteeing that we will only explore O(k) vertices (the smallest k and at most 2k their children). I figured that this answer could be helpful for those struggling to understand why we need this somewhat not straightforward construction.

like image 1
Nikita Skybytskyi Avatar answered Oct 16 '22 23:10

Nikita Skybytskyi