I have a problem, and an OK-ish solution. I'm hoping there's a better solution out there.
The problem
I have an array with around 200,000 integers. Given two indices, i1 and i2, I need to calculate the sum of all elements between i1 and i2. Each integer in the array is between 1 and 4 inclusive. For example:
a = [1, 3, 2, 4, 3, 2, 4, 1];
subsection_sum(a, 0, 3); // returns 6: (1 + 3 + 2)
This operation will be performed around 200,000 times, so needs to be pretty fast. A simple counter in a for loop is O(n), and far too slow. The array is never modified after construction, so it's OK to have a relatively expensive pre-processing stage.
My best solution so far
This algorithm works in O(log n) time:
First pad the original array with zeros until its length is a power of two. Next, split the array into two equal parts and store the sum of each. Then split the array into quarters and store the sum of each. Then eighths. Continue doing this until the array is split into sections 2 elements long. For the 8-element array above, this takes two steps:
halves = [(a[0] + a[1] + a[2] + a[3]), (a[4] + a[5] + a[6] + a[7])]
quarters = [(a[0] + a[1]), (a[2] + a[3]), (a[4] + a[5]), (a[6] + a[7])]
Then given two indices, it is now possible to work out the subsection_sum in O(log n) time. For example, subsection_sum(a, 2, 7) == quarters[1] + halves[1].
Introduce an auxiliary array that contains the cumulative sum. That is, element i
of the the auxiliary array has the sum of elements 0 through i
of the original array. The subarray sum is then just the difference of two elements from the auxiliary array. This will give a result in constant time, O(1)
.
This depends on an invariant in the subsection_sum
function given in the question,:
subsection_sum(a, 0, i2) = subsection_sum(a, 0, i1) + subsection_sum(a, i1, i2)
where I'm assuming i1 <= i2
. Rearranging, we have:
subsection_sum(a, i1, i2) = subsection_sum(a, 0, i2) - subsection_sum(a, 0, i1)
Note that the sums on the right-hand side both start from 0
. The auxiliary array can be viewed as caching the values for the sums from zero, subsection_sum(a, 0, i)
, for all i
.
If you can afford O(n)
additional storage, you can create a lookup table whose i
th element is the sum of the elements at indices 0
through i
(inclusive) in the input array. In pseudocode:
def computeLookupTable(arr):
let n = arr.length
let lookupTable = new Array()
lookupTable[0] = arr[0]
for i=1 to n:
lookupTable[i] = arr[i] + lookupTable[i-1]
return lookupTable
Then you can use this table to compute the sum of all elements in array
between i1
and i2
by taking the difference
lookupTable[i2] - lookupTable[i1]
which takes constant time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With