Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split array into evenly-distributed chunks

I'm looking for an efficient way to split an array into chunks that have a similar histogram.

For example, given the array:

l = np.array([1, 2, 3, 4, 1, 1, 1, 2, 3, 4, 55, 5, 5, 5, 5, 5, 3, 2, 2, 21, 2])

I would like to obtain:

c1 = np.array([1, 4, 1, 5, 5, 3, 2])
c2 = np.array([2, 1, 3, 4, 5, 5, 2])
c3 = np.array([3, 1, 2, 55, 5, 2, 21])

Not only this but each chunk should have a similar size and a similar sum on a given function f:

1. |sum(ci, f) - sum(cj, f)| < e, for i != j
2. |len(ci) - len(cj)| < e, for i != j

where

sum(c, f) = f(c[0]) + ... + f(c[len(c)])

Edit: To clarify the intention of this. I want to parallelize a process over a list into n sub-processes but the cost has to be distributed among each sub-process evenly. The cost of processing an element in this list is a function f of the integer at the same position in array l, where f is the computational complexity of the process. For example, f(i)=i^2. So, I want all processes to have the same computational cost and not to have processes that finish too early while others last forever.

like image 579
synack Avatar asked Apr 04 '14 11:04

synack


1 Answers

Let's start with a very weak assumption that similar histograms are defined in a following rudimentary way: for a set of integers S1, the histogram H(S1) is similar to an histogram H(S2) of set S2 if sum(S1) = sum(S2).

Generally you are after finding subsets S1, S2, ..., SN of an array A such that f(S1) = f(S2) = ... = f(SN), and where under our assumptions f=sum. Unfortunately you've got yourself a k-Partition problem, which is NP-hard, and if you'll get someone to find an efficient way (ie poly-time) to do that, as you requested, the consequence would be that P=NP was proven to be true first on stackoverflow!

like image 115
mockinterface Avatar answered Oct 01 '22 18:10

mockinterface