Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting an array finding minimum difference between the sum of two subarray in distributed environment

This problem I was asked yesterday. I had to write a code to split the array into two parts such that the difference between the sum of these two part would be minimum.

Here is the code I wrote with the complexity O(n)

function solution(a) {
  let leftSum = 0;
  let rightSum = a.reduce((acc, value) => acc + value ,0);
  let min = Math.abs(rightSum - leftSum);
  a.forEach((item, i) => {
   leftSum += a[i];
   rightSum -= a[i]; 
   const tempMin = Math.abs(rightSum - leftSum);
   if(tempMin < min) min = tempMin;
  })
  return min;
}

But then I was asked if the input array is of length 10 million, how would I solve this problem in a distributed environment?

I am new to distributed programming, need help in this.

like image 800
Mohd Hassan Avatar asked Jun 07 '19 14:06

Mohd Hassan


2 Answers

If you have N node,s then split the array into N sequential subarrays; this will give you N sequential sums. Take a pass to determine which subarray contains the desired split point. The difference between the "before" and "after" sums is your bias target for the next phase ...

Now divide that "middle" array into N pieces. Again, you look for the appropriate split point, except that now you know the exact result you'd like (since you have the array sum and your missing difference).

Repeat that second paragraph until you can fit the entire subarray into one node and that's the fastest way to finish the computation for your project.


You can speed this up somewhat by keeping a cumulative sum at each value; this will allow you to find the appropriate split point somewhat faster at each stage, as you can use a binary or interpolation search for every stage after the first.

like image 163
Prune Avatar answered Oct 24 '22 15:10

Prune


Given an array of length N, and given M available nodes, divide the array into chunks of size N/M. Each node computes the sum of its chunk, and reports back. The total is computed by adding the partial sums. Then the total and the partial sums are distributed to each of the nodes. Each node determines the best split point within its chunk (the local minimum), and reports back. The global minimum is computed from the local minimums.

For example, if the array has 10 million entries, and 200 nodes are available, the chunk size is 50000. So each node receives 50000 numbers, and reports back the sum. The total of the array is computed by adding the 200 partial sums. Then each node is given the total, along with the 200 partial sums. The information at each node now consists of

  • a chunk number
  • the 50000 array entries for that chunk
  • the array total
  • the 200 partial sums

From that information, each node can compute its local minimum. The global minimum is computed from the 200 local minimums.

In the ideal case, where network bandwidth is infinite, network latency is zero, and any number of nodes can be used, the chunk size should be sqrt(N). So each node receives sqrt(N) array elements, and then receives sqrt(N) partial sums. Under those ideal conditions, the running time is O(sqrt(N)) instead of O(N).

Of course, in the real world, it makes no sense to try to distribute a problem like this. The amount of time (per array element) to send the array elements over the network is significant. Much larger than the amount of time (per array element) needed to solve the problem on a single computer.

like image 45
user3386109 Avatar answered Oct 24 '22 17:10

user3386109