Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a minimal subarray of n integers of sum >= k in linear time

Recently I've been struggling with the following problem:

Given an array of integers, find a minimal (shortest length) subarray that sums to at least k.

Obviously this can easily be done in O(n^2). I was able to write an algorithm that solves it in linear time for natural numbers, but I can't figure it out for integers.

My latest attempt was this:

def find_minimal_length_subarr_z(arr, min_sum):
    found = False
    start = end = cur_end = cur_sum = 0
    for cur_start in range(len(arr)):
        if cur_end <= cur_start:
            cur_end, cur_sum = cur_start, arr[cur_start]
        else:
            cur_sum -= arr[cur_start-1]
        # Expand
        while cur_sum < min_sum and cur_end < len(arr)-1:
            cur_end += 1
            cur_sum += arr[cur_end]
        # Contract
        while cur_end > cur_start:
            new_sum = cur_sum - arr[cur_end]
            if new_sum >= min_sum or new_sum >= cur_sum:
                cur_end -= 1
                cur_sum = new_sum
            else:
                break
        if cur_sum >= min_sum and (not found or cur_end-cur_start < end-start):
            start, end, found = cur_start, cur_end, True
    if found:
        return start, end

For example:

[8, -7, 5, 5, 4], 12 => (2, 4)

However, it fails for:

[-12, 2, 2, -12, 2, 0], 4

where the correct result is (1, 2) but the algorithm doesn't find it.

Can this at all be done in linear time (with preferably constant space complexity)?

like image 669
Thijs van Dien Avatar asked Jun 30 '13 13:06

Thijs van Dien


1 Answers

Here's one that's linear time but also linear space. The extra space comes from a deque that could grow to linear size. (There's also a second array to maintain cumulative sums, but that could be removed pretty easily.)

from collections import deque
def find_minimal_length_subarr(arr, k):
   # assume k is positive
   sumBefore = [0]
   for x in arr: sumBefore.append(sumBefore[-1] + x)
   bestStart = -1
   bestEnd = len(arr)
   startPoints = deque()
   start = 0
   for end in range(len(arr)):
      totalToEnd = sumBefore[end+1]
      while startPoints and totalToEnd - sumBefore[startPoints[0]] >= k: # adjust start
         start = startPoints.popleft()
      if totalToEnd - sumBefore[start] >= k and end-start < bestEnd-bestStart:
         bestStart,bestEnd = start,end
      while startPoints and totalToEnd <= sumBefore[startPoints[-1]]: # remove bad candidates
         startPoints.pop()
      startPoints.append(end+1) # end+1 is a new candidate
   return (bestStart,bestEnd)

The deque holds a sequence of candidate start positions from left to right. The key invariant is that positions in the deque are also sorted by increasing value of "sumBefore".

To see why, consider two positions x and y with x > y, and suppose sumBefore[x] <= sumBefore[y]. Then x is a strictly better start position than y (for segments ending at x or later), so we need never consider y again.

FURTHER EXPLANATION:

Imagine a naive algorithm that looked like this:

for end in 0..N-1
   for start in 0..end
      check the segment from start to end

I'm try to improve the inner loop to only consider certain start points instead of all possible start points. So when can we eliminate a particular start point from further consideration? In two situations. Consider two start points S0 and S1 with S0 to the left of S1.

First, we can eliminate S0 if we ever find that S1 begins an eligible segment (that is, a segment summing to at least k). That's what the first while loop does, where start is S0 and startPoints[0] is S1. Even if we found some future eligible segment starting at S0, it would be longer than the segment we already found starting at S1.

Second, we can eliminate S0 if the sum of the elements from S0 to S1-1 is <= 0 (or, equivalently if the sum of the elements before S0 >= the sum of the elements before S1). This is what the second while loop does, where S0 is startPoints[-1] and S1 is end+1. Trimming off the elements from S0 to S1-1 always makes sense (for end points at S1 or later), because it makes the segment shorter without decreasing its sum.

Actually, there's a third situation where we could eliminate S0: when the distance from S0 to end is greater than the length of the shortest segment found so far. I didn't implement this case because it wasn't needed.

like image 145
Chris Okasaki Avatar answered Sep 25 '22 18:09

Chris Okasaki