Recently I've been struggling with the following problem:
Given an array of integers, find a minimal (shortest length) subarray that sums to at least k.
Obviously this can easily be done in O(n^2). I was able to write an algorithm that solves it in linear time for natural numbers, but I can't figure it out for integers.
My latest attempt was this:
def find_minimal_length_subarr_z(arr, min_sum):
found = False
start = end = cur_end = cur_sum = 0
for cur_start in range(len(arr)):
if cur_end <= cur_start:
cur_end, cur_sum = cur_start, arr[cur_start]
else:
cur_sum -= arr[cur_start-1]
# Expand
while cur_sum < min_sum and cur_end < len(arr)-1:
cur_end += 1
cur_sum += arr[cur_end]
# Contract
while cur_end > cur_start:
new_sum = cur_sum - arr[cur_end]
if new_sum >= min_sum or new_sum >= cur_sum:
cur_end -= 1
cur_sum = new_sum
else:
break
if cur_sum >= min_sum and (not found or cur_end-cur_start < end-start):
start, end, found = cur_start, cur_end, True
if found:
return start, end
For example:
[8, -7, 5, 5, 4], 12 => (2, 4)
However, it fails for:
[-12, 2, 2, -12, 2, 0], 4
where the correct result is (1, 2)
but the algorithm doesn't find it.
Can this at all be done in linear time (with preferably constant space complexity)?
Here's one that's linear time but also linear space. The extra space comes from a deque that could grow to linear size. (There's also a second array to maintain cumulative sums, but that could be removed pretty easily.)
from collections import deque
def find_minimal_length_subarr(arr, k):
# assume k is positive
sumBefore = [0]
for x in arr: sumBefore.append(sumBefore[-1] + x)
bestStart = -1
bestEnd = len(arr)
startPoints = deque()
start = 0
for end in range(len(arr)):
totalToEnd = sumBefore[end+1]
while startPoints and totalToEnd - sumBefore[startPoints[0]] >= k: # adjust start
start = startPoints.popleft()
if totalToEnd - sumBefore[start] >= k and end-start < bestEnd-bestStart:
bestStart,bestEnd = start,end
while startPoints and totalToEnd <= sumBefore[startPoints[-1]]: # remove bad candidates
startPoints.pop()
startPoints.append(end+1) # end+1 is a new candidate
return (bestStart,bestEnd)
The deque holds a sequence of candidate start positions from left to right. The key invariant is that positions in the deque are also sorted by increasing value of "sumBefore".
To see why, consider two positions x and y with x > y, and suppose sumBefore[x] <= sumBefore[y]. Then x is a strictly better start position than y (for segments ending at x or later), so we need never consider y again.
FURTHER EXPLANATION:
Imagine a naive algorithm that looked like this:
for end in 0..N-1
for start in 0..end
check the segment from start to end
I'm try to improve the inner loop to only consider certain start points instead of all possible start points. So when can we eliminate a particular start point from further consideration? In two situations. Consider two start points S0 and S1 with S0 to the left of S1.
First, we can eliminate S0 if we ever find that S1 begins an eligible segment (that is, a segment summing to at least k). That's what the first while loop does, where start is S0 and startPoints[0] is S1. Even if we found some future eligible segment starting at S0, it would be longer than the segment we already found starting at S1.
Second, we can eliminate S0 if the sum of the elements from S0 to S1-1 is <= 0 (or, equivalently if the sum of the elements before S0 >= the sum of the elements before S1). This is what the second while loop does, where S0 is startPoints[-1] and S1 is end+1. Trimming off the elements from S0 to S1-1 always makes sense (for end points at S1 or later), because it makes the segment shorter without decreasing its sum.
Actually, there's a third situation where we could eliminate S0: when the distance from S0 to end is greater than the length of the shortest segment found so far. I didn't implement this case because it wasn't needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With