Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find three elements in a sorted array which sum to a fourth element

A friend of mine recently got this interview question, which seems to us to be solvable but not within the asymptotic time bounds that the interviewer thought should be possible. Here is the problem:

You have an array of N integers, xs, sorted but possibly non-distinct. Your goal is to find four array indices(1)(a,b,c,d) such that the following two properties hold:

xs[a] + xs[b] + xs[c] = xs[d]

a < b < c < d

The goal is to do this in O(N2) time.

First, an O(N3log(N)) solution is obvious: for each (a,b,c) ordered triple, use binary search to see if an appropriate d can be found. Now, how to do better?

One interesting suggestion from the interviewer is to rewrite the first condition as:

xs[a] + xs[b] = xs[d] - xs[c]

It's not clear what to do after this, but perhaps we could chose some pivot value P, and search for an (a,b) pair adding up to P, and a (d,c) pair subtracting to it. That search is easy enough to do in O(n) time for a given P, by searching inwards from both ends of the array. However, it seems to me that the problem with this is that there are N2 such values P, not just N of them, so we haven't actually reduced the problem size at all: we're doing O(N) work, O(N2) times.

We found some related problems being discussed online elsewhere: Find 3 numbers in an array adding to a given sum is solvable in N2 time, but requires that the sum be fixed ahead of time; adapting the same algorithm but iterating through each possible sum leaves us at N3 as always.

Another related problem seems to be Find all triplets in array with sum less than or equal to given sum, but I'm not sure how much of the stuff there is relevant here: an inequality rather than an equality mixes things up quite a bit, and of course the target is fixed rather than varying.

So, what are we missing? Is the problem impossible after all, given the performance requirements? Or is there a clever algorithm we're unable to spot?


(1) Actually the problem as posed is to find all such (a,b,c,d) tuples, and return a count of how many there are. But I think even finding a single one of them in the required time constraints is hard enough.

like image 851
amalloy Avatar asked Sep 27 '16 07:09

amalloy


2 Answers

If the algorithm would have to list the solutions (i.e. the sets of a, b, c, and d that satisfy the condition), the worst case time complexity is O(n4):

1. There can be O(n4) solutions

The trivial example is an array with only 0 values in it. Then a, b, c and d have all the freedom as long as they stay in order. This represents O(n4) solutions.

But more generally arrays which follow the following pattern have O(n4) solutions:

w, w, w, ... x, x, x, ..., y, y, y, ...  z, z, z, ....

With just as many occurrences of each, and:

w + x + y = z

However, to only produce the number of solutions, an algorithm can have a better time complexity.

2. Algorithm

This is a slight variation of the already posted algorithm, which does not involve the H factor. It also describes how to handle cases where different configurations lead to the same sums.

  • Retrieve all pairs and store them in an array X, where each element gets the following information:

    a: the smallest index of the two
    b: the other index
    sum: the value of xs[a] + xs[b]

  • At the same time also store for each such pair in another array Y, the following:

    c: the smallest index of the two
    d: the other index
    sum: the value of xs[d] - xs[c]

The above operation has a time complexity of O(n²)

  • Sort both arrays by their element's sum attribute. In case of equal sum values, the sort order will be determined as follows: for the X array by increasing b; for the Y array by decreasing c. Sorting can be done in O(n²) O(n²logn) time.

[Edit: I could not prove the earlier claim of O(n²) (unless some assumptions are made that allow for a radix/bucket sorting algorithm, which I will not assume). As noted in comments, in general an array with elements can be sorted in O(n²logn²), which is O(n²logn), but not O(n²)]

  • Go through both arrays in "tandem" to find pairs of sums that are equal. If that is the case, it needs to be checked that X[i].b < Y[j].c. If so it represents a solution. But there could be many of them, and counting those in an acceptable time needs special care.

    Let m = n(n-1)/2, i.e. the number of elements in array X (which is also the size of array Y):

    i = 0
    j = 0
    while i < m and j < m:
        if X[i].sum < Y[j].sum:
            i = i + 1
        elif X[i].sum > Y[j].sum:
            j = j + 1
        else:
            # We have a solution. Need to count all others that have same sums in X and Y.
            # Find last match in Y and set k as index to it:
            countY = 0
            while k < m and X[i].sum == Y[j].sum and X[i].b < Y[j].c:
                countY = countY + 1
                j = j + 1
            k = j - 1
            # add chunks to `count`:
            while i < m and countY >= 0 and X[i].sum == Y[k].sum:
                while countY >= 0 and X[i].b >= Y[k].c:
                    countY = countY - 1
                    k = k - 1
                count = count + countY
                i = i + 1

Note that although there are nested loops, the variable i only ever increments, and so does j. The variable k always decrements in the innermost loop. Although it also gets higher values to start from, it can never address the same Y element more than a constant number of times via the k index, because while decrementing this index, it stays within the "same sum" range of Y.

So this means that this last part of the algorithm runs in O(m), which is O(n²). As my latest edit confirmed that the sorting step is not O(n²), that step determines the overall time-complexity: O(n²logn).

like image 102
trincot Avatar answered Nov 07 '22 00:11

trincot


So one solution can be :

List all x[a] + x[b] value possible such that a < b and hash them in this fashion

key = (x[a]+x[b]) and value = (a,b).

Complexity of this step - O(n^2)

Now List all x[d] - x[c] values possible such that d > c. Also for each x[d] - x[c] search the entry in your hash map by querying. We have a solution if there exists an entry such that c > b for any hit. Complexity of this step - O(n^2) * H.

Where H is the search time in your hashmap.

Total complexity - O(n^2)* H. Now H may be O(1). This could done if the range of values in the array is small. Also the choice of hash function would depend on the properties of elements in the array.

like image 20
adisticated Avatar answered Nov 07 '22 02:11

adisticated