Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find pairs that sum to X in an array of integers of size N having element in the range 0 to N-1

It is an interview question. We have an array of integers of size N containing element between 0 to N-1. It may be possible that a number can occur more than two times. The goal is to find pairs that sum to a given number X.

I did it using an auxiliary array having count of elements of primary array and then rearranging primary according auxiliary array so that primary is sorted and then searched for pairs.

But interviewer wanted space complexity constant, so I told him to sort the array but it is nlogn time complexity solution. He wanted O(n) solution.

Is there any method available to do it in O(n) without any extra space?

like image 285
SIGSTP Avatar asked Jan 31 '13 07:01

SIGSTP


2 Answers

No, I don't believe so. You either need extra space to be able to "sort" the data in O(n) by assigning to buckets, or you need to sort in-place which will not be O(n).


Of course, there are always tricks if you can make certain assumptions. For example, if N < 64K and your integers are 32 bits wide, you can multiplex the space required for the count array on top of the current array.

In other words, use the lower 16 bits for storing the values in the array and then use the upper 16 bits for your array where you simply store the count of values matching the index.

Let's use a simplified example where N == 8. Hence the array is 8 elements in length and the integers at each element are less than 8, though they're eight bits wide. That means (initially) the top four bits of each element are zero.

  0    1    2    3    4    5    6    7    <- index
(0)7 (0)6 (0)2 (0)5 (0)3 (0)3 (0)7 (0)7

The pseudo-code for an O(n) adjustment which stores the count into the upper four bits is:

for idx = 0 to N:
    array[array[idx] % 16] += 16 // add 1 to top four bits

By way of example, consider the first index which stores 7. That assignment statement will therefore add 16 to index 7, upping the count of sevens. The modulo operator is to ensure that values which have already been increased only use the lower four bits to specify the array index.

So the array eventually becomes:

  0    1    2    3    4    5    6    7    <- index
(0)7 (0)6 (1)2 (2)5 (0)3 (1)3 (1)7 (3)7

Then you have your new array in constant space and you can just use int (array[X] / 16) to get the count of how many X values there were.

But, that's pretty devious and requires certain assumptions as mentioned before. It may well be that level of deviousness the interviewer was looking for, or they may just want to see how a prospective employee handle the Kobayashi Maru of coding :-)


Once you have the counts, it's a simple matter to find pairs that sum to a given X, still in O(N). The basic approach would be to get the cartestian product. For example, again consider that N is 8 and you want pairs that sum to 8. Ignore the lower half of the multiplexed array above (since you're only interested in the counts, you have:

 0   1   2   3   4   5   6   7    <- index
(0) (0) (1) (2) (0) (1) (1) (3)

What you basically do is step through the array one by one getting the product of the counts of numbers that sum to 8.

  • For 0, you would need to add 8 (which doesn't exist).
  • For 1, you need to add 7. The product of the counts is 0 x 3, so that gives nothing.
  • For 2, you need to add 6. The product of the counts is 1 x 1, so that gives one occurrence of (2,6).
  • For 3, you need to add 5. The product of the counts is 2 x 1, so that gives two occurrences of (3,5).
  • For 4, it's a special case since you can't use the product. In this case it doesn't matter since there are no 4s but, if there was one, that couldn't become a pair. Where the numbers you're pairing are the same, the formula is (assuming there are m of them) 1 + 2 + 3 + ... + m-1. With a bit of mathematical widardry, that turns out to be m(m-1)/2.

Beyond that, you're pairing with values to the left, which you've already done so you stop.

So what you have ended up with from

a b c d e f g h <- identifiers
7 6 2 5 3 3 7 7

is:

(2,6) (3,5) (3,5)
(c,b) (e,d) (f,d) <- identifiers

No other values add up to 8.


The following program illustrates this in operation:

#include <stdio.h>

int arr[] = {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 4, 4, 4, 4};
#define SZ (sizeof(arr) / sizeof(*arr))

static void dumpArr (char *desc) {
    int i;
    printf ("%s:\n   Indexes:", desc);
    for (i = 0; i < SZ; i++) printf (" %2d", i);

    printf ("\n   Counts :");
    for (i = 0; i < SZ; i++) printf (" %2d", arr[i] / 100);

    printf ("\n   Values :");
    for (i = 0; i < SZ; i++) printf (" %2d", arr[i] % 100);

    puts ("\n=====\n");
}

That bit above is just for debugging. The actual code to do the bucket sort is below:

int main (void) {
    int i, j, find, prod;

    dumpArr ("Initial");

    // Sort array in O(1) - bucket sort.

    for (i = 0; i < SZ; i++) {
        arr[arr[i] % 100] += 100;
    }

And we finish with the code to do the pairings:

    dumpArr ("After bucket sort");

    // Now do pairings.

    find = 8;
    for (i = 0, j = find - i; i <= j; i++, j--) {
        if (i == j) {
            prod = (arr[i]/100) * (arr[i]/100-1) / 2;
            if (prod > 0) {
                printf ("(%d,%d) %d time(s)\n", i, j, prod);
            }
        } else {
            if ((j >= 0) && (j < SZ)) {
                prod = (arr[i]/100) * (arr[j]/100);
                if (prod > 0) {
                    printf ("(%d,%d) %d time(s)\n", i, j, prod);
                }
            }
        }
    }

    return 0;
}

The output is:

Initial:
   Indexes:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
   Counts :  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   Values :  3  1  4  1  5  9  2  6  5  3  5  8  9  4  4  4  4
=====

After bucket sort:
   Indexes:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
   Counts :  0  2  1  2  5  3  1  0  1  2  0  0  0  0  0  0  0
   Values :  3  1  4  1  5  9  2  6  5  3  5  8  9  4  4  4  4
=====

(2,6) 1 time(s)
(3,5) 6 time(s)
(4,4) 10 time(s)

and, if you examine the input digits, you'll find the pairs are correct.

like image 175
paxdiablo Avatar answered Oct 18 '22 18:10

paxdiablo


This may be done by converting the input array to the list of counters "in-place" in O(N) time. Of course this assumes input array is not immutable. There is no need for any additional assumptions about unused bits in each array element.

Start with the following pre-processing: try to move each array's element to the position determined by element's value; move element on this position also to the position determined by its value; continue until:

  • next element is moved to the position from where this cycle was started,
  • next element cannot be moved because it is already on the position corresponding to its value (in this case put current element to the position from where this cycle was started).

After pre-processing every element either is located at its "proper" position or "points" to its "proper" position. In case we have an unused bit in each element, we could convert each properly positioned element into a counter, initialize it with "1", and allow each "pointing" element to increase appropriate counter. Additional bit allows to distinguish counters from values. The same thing may be done without any additional bits but with less trivial algorithm.

Count how may values in the array are equal to 0 or 1. If there are any such values, reset them to zero and update counters at positions 0 and/or 1. Set k=2 (size of the array's part that has values less than k replaced by counters). Apply the following procedure for k = 2, 4, 8, ...

  1. Find elements at positions k .. 2k-1 which are at their "proper" position, replace them with counters, initial value is "1".
  2. For any element at positions k .. 2k-1 with values 2 .. k-1 update corresponding counter at positions 2 .. k-1 and reset value to zero.
  3. For any element at positions 0 .. 2k-1 with values k .. 2k-1 update corresponding counter at positions k .. 2k-1 and reset value to zero.

All iterations of this procedure together have O(N) time complexity. At the end the input array is completely converted to the array of counters. The only difficulty here is that up to two counters at positions 0 .. 2k-1 may have values greater than k-1. But this could be mitigated by storing two additional indexes for each of them and processing elements at these indexes as counters instead of values.

After an array of counters is produced, we could just multiply pairs of counters (where corresponding pair of indexes sum to X) to get the required counts of pairs.

like image 45
Evgeny Kluev Avatar answered Oct 18 '22 18:10

Evgeny Kluev