Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find sum of an array of subarrays

This is a question from the 2017 Google APAC. Problem D: Sum of Sum

Alice presented her friend Bob with an array of N positive integers, indexed from 1 to N. She challenged Bob with many queries of the form "what is the sum of the numbers between these two indexes?" But Bob was able to solve the problem too easily.Alice took her array and found all N*(N+1)/2 non-empty subarrays of it. She found the sum of each subarray, and then sorted those values (in nondecreasing order) to create a new array, indexed from 1 to N*(N+1)/2. For example, for an initial array [2, 3, 2], Alice would generate the subarrays [2], [3], [2], [2, 3], [3, 2], and [2, 3, 2] (note that [2, 2], for example, is NOT a subarray). Then she'd take the sums -- 2, 3, 2, 5, 5, 7 -- and sort them to get a new array of [2, 2, 3, 5, 5, 7].Alice has given the initial array to Bob, along with Q queries of the form "what is the sum of the numbers from index Li to Ri, inclusive, in the new array?" Now Bob's in trouble! Can you help him out?

The straight forward solution is too inefficient even in c++ for large data sets. Is there a more efficient way to solve this?

Currently i'm going through this for loop to construct the final array:

    multiset<int> sums;
    long long int temp = 0;
    for (long long int len = 1; len <= n; ++len)
    {
        for (int start = 0; start+len <= n; ++start)
        {
            temp = 0;
            for (int i = 0; i < len; ++i)
            {
                temp += arr[start + i]; //arr stores the original array of n digits
            }
            sums.insert(temp);
        }
    }

P.S: Is my current implementation O(n^5)? My mistake, I can see how its O(n^3) now. Thank you Edit: Answers so far have been helpful but with large datasets involving n = 200000 items, it appears that any solution that pre calculates the entire array of subarrays is too expensive. All the submitted solutions don't appear to calculate the entire array of subarrays

like image 527
AnkithD Avatar asked Jul 01 '16 19:07

AnkithD


People also ask

How do you find Subarrays in array?

We can use substr function to find the all possible sub array.

What are subarrays of an array?

A subarray is a contiguous part of array. An array that is inside another array. For example, consider the array [1, 2, 3, 4], There are 10 non-empty sub-arrays. The subarrays are (1), (2), (3), (4), (1,2), (2,3), (3,4), (1,2,3), (2,3,4) and (1,2,3,4).


2 Answers

As noted in the comments, your solution is O(N^3), computed as O(N^2) times a O(N) sum and an insertion in multiset (which you can ignore compared to O(N), see bottom of this answer).

But swap your two first loops, you're doing the exact same N*(N+1)/2 sums and insertions:

for (int start = 0; start < n; ++start)
{
    for (long long int len = 1; start + len <= n; ++len)
    {
        temp = 0;
        for (int i = 0; i < len; ++i)
        {
            temp += arr[start + i]; //arr stores the original array of n digits
        }
        sums.insert(temp);
    }
}

Now if you look at your temp sums it seems obvious you're doing redundant work. Sum from start + 1 to start + 1, then from start + 1 to start + 2, then from start + 1 to start + 3, etc. For each new len, the sum you're computing is the one for the previous value of len, plus one item. Hence you can remove this inner loop:

for (int start = 0; start < n; ++start)
{
    temp = 0;
    for (long long int len = 1; start + len <= n; ++len)
    {
        temp += arr[start + len]; //arr stores the original array of n digits
        sums.insert(temp);
    }
}

So in N*(N+1)/2 you generated the set of values. Of course, using a multiset you hide the data sorting, but insertion in general costs log(sums.size()).

Sorting separately, since sorting a set of size S should take S * log(S), would cost N*(N+1)/2 * log ( N*(N+1)/2 ) which is (just) less than N*(N+1) * log((N+1)/sqrt(2)).

Note that since you have positive integers, each set of len integers you generate with the inner loop is already sorted, so maybe you can use them to do something smart to speed up sorting. Which is also what multiset does according to cplusplus.com:

If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted according to the same ordering criterion used by the container.

like image 179
Cimbali Avatar answered Oct 27 '22 01:10

Cimbali


Doing a little search I have found this, I hope it will be useful

https://www.quora.com/How-can-problem-D-Sums-of-sums-from-Round-E-of-the-Google-APAC-Test-2016-be-solved-for-the-large-dataset

like image 26
Daniel Alejandro Avatar answered Oct 27 '22 03:10

Daniel Alejandro