Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the subarray that has sum closest to zero or a certain value t in O(nlogn)

Actually it is the problem #10 of chapter 8 of Programming Pearls 2nd edition. It asked two questions: given an array A[] of integers(positive and nonpositive), how can you find a continuous subarray of A[] whose sum is closest to 0? Or closest to a certain value t?

I can think of a way to solve the problem closest to 0. Calculate the prefix sum array S[], where S[i] = A[0]+A[1]+...+A[i]. And then sort this S according to the element value, along with its original index information kept, to find subarray sum closest to 0, just iterate the S array and do the diff of the two neighboring values and update the minimum absolute diff.

Question is, what is the best way so solve second problem? Closest to a certain value t? Can anyone give a code or at least an algorithm? (If anyone has better solution to closest to zero problem, answers are welcome too)

like image 484
Peiti Li Avatar asked May 05 '13 20:05

Peiti Li


3 Answers

To solve this problem, you can build an interval-tree by your own, or balanced binary search tree, or even beneficial from STL map, in O(nlogn).

Following is use STL map, with lower_bound().

#include <map>
#include <iostream>
#include <algorithm>
using namespace std;

int A[] = {10,20,30,30,20,10,10,20};

// return (i, j) s.t. A[i] + ... + A[j] is nearest to value c
pair<int, int> nearest_to_c(int c, int n, int A[]) {
    map<int, int> bst;
    bst[0] = -1;
    // barriers
    bst[-int(1e9)] = -2;
    bst[int(1e9)] = n;

    int sum = 0, start, end, ret = c;
    for (int i=0; i<n; ++i) {
            sum += A[i];
            // it->first >= sum-c, and with the minimal value in bst
            map<int, int>::iterator it = bst.lower_bound(sum - c);
            int tmp = -(sum - c - it->first);
            if (tmp < ret) {
                    ret = tmp;
                    start = it->second + 1;
                    end = i;
            }

            --it;
            // it->first < sum-c, and with the maximal value in bst
            tmp = sum - c - it->first;
            if (tmp < ret) {
                    ret = tmp;
                    start = it->second + 1;
                    end = i;
            }

            bst[sum] = i;
    }
    return make_pair(start, end);
}

// demo
int main() {
    int c;
    cin >> c;
    pair<int, int> ans = nearest_to_c(c, 8, A);

    cout << ans.first << ' ' << ans.second << endl;
    return 0;
}
like image 63
frankyym Avatar answered Oct 20 '22 16:10

frankyym


You can adapt your method. Assuming you have an array S of prefix sums, as you wrote, and already sorted in increasing order of sum value. The key concept is to not only examine consecutive prefix sums, but instead use two pointers to indicate two positions in the array S. Written in a (slightly pythonic) pseudocode:

left = 0                 # Initialize window of length 0 ...
right = 0                # ... at the beginning of the array
best = ∞                 # Keep track of best solution so far
while right < length(S): # Iterate until window reaches the end of the array
  diff = S[right] - S[left]
  if diff < t:           # Window is getting too small
    if t - diff < best:  # We have a new best subarray
      best = t - diff
      # remember left and right as well
    right = right + 1    # Make window bigger
  else:                  # Window getting too big
    if diff - t < best   # We have a new best subarray
      best = diff - t
      # remember left and right as well
    left = left + 1      # Make window smaller

The complexity is bound by the sorting. The above search will take at most 2n=O(n) iterations of the loop, each with computation time bound by a constant. Note that the above code was conceived for positive t.

The code was conceived for positive elements in S, and positive t. If any negative integers crop up, you might end up with a situation where the original index of right is smaller than that of left. So you'd end up with a sub sequence sum of -t. You can check this condition in the if … < best checks, but if you only suppress such cases there, I believe that you might be missing some relevant cases. Bottom line is: take this idea, think it through, but you'll have to adapt it for negative numbers.

Note: I think that this is the same general idea which Boris Strandjev wanted to express in his solution. However, I found that solution somewhat hard to read and harder to understand, so I'm offering my own formulation of this.

like image 4
MvG Avatar answered Oct 20 '22 16:10

MvG


Your solution for the 0 case seems ok to me. Here is my solution to the second case:

  • You again calculate the prefix sums and sort.
  • You initialize to indices start to 0 (first index in the sorted prefix array) end to last (last index of the prefix array)
  • you start iterating over start 0...last and for each you find the corresponding end - the last index in which the prefix sum is such that prefix[start] + prefix[end] > t. When you find that end the best solution for start is either prefix[start] + prefix[end] or prefix[start] + prefix[end - 1] (the latter taken only if end > 0)
  • the most important thing is that you do not search for end for each start from scratch - prefix[start] increases in value when iterating over all possible values for start, which means that in each iteration you are interested only in values <= the previous value of end.
  • you can stop iterating when start > end
  • you take the best of all values obtained for all start positions.

It can easily be proved that this will give you complexity of O(n logn) for the entire algorithm.

like image 2
Boris Strandjev Avatar answered Oct 20 '22 17:10

Boris Strandjev