I have a list of integers, and I need to split it into a given number of sublists (with no restrictions on order or the number of elements in each), in a way that minimizes the average difference in sums of each sublist.
For example:
>>> x = [4, 9, 1, 5]
>>> sublist_creator(x, 2)
[[9], [4, 1, 5]]
because list(map(sum, sublist_creator(x, 2)))
yields [9, 10]
, minimizing the average distance. Alternatively, [[9, 1], [4, 5]]
would have been equally correct, and my use case has no preference between two possibilities.
The only way I can think of to do this is by checking, iteratively, all possible combinations, but I'm working with a list of ~5000 elements and need to split it into ~30 sublists, so that approach is prohibitively expensive.
Split List Into Sublists Using the array_split() Function in NumPy. The array_split() method in the NumPy library can also split a large array into multiple small arrays. This function takes the original array and the number of chunks we need to split the array into and returns the split chunks.
To split a list into N parts of approximately equal length with Python, we can use list comprehension. We define the chunkify function to split the lst list into n chunks. To do this, we use list comprehension to return slices of list with from index i to the end with n items in each chunk.
linspace method. Just specify the number of parts you want the array to be divided in to. The divisions will be of nearly equal size.
Here's the outline:
N
empty listssort()
your input array in ascending orderpop()
the last element from the sorted arrayappend()
the popped element to the list with the lowest sum()
of the elementsWith M=5000 elements and N=30 lists this approach might take about O(N*M) if you carefully store the intermediate sums of the sublists instead of calculating them from the scratch every time.
@lenik's solution has the right idea, but can use a heap queue that keeps track of the total of each sub-list and its index in sorted order to improve the cost of finding the sub-list of the minimum size to O(log n), resulting in an overall O(m x log n) time complexity:
import heapq
def sublist_creator(lst, n):
lists = [[] for _ in range(n)]
totals = [(0, i) for i in range(n)]
heapq.heapify(totals)
for value in lst:
total, index = heapq.heappop(totals)
lists[index].append(value)
heapq.heappush(totals, (total + value, index))
return lists
so that:
sublist_creator(x, 2)
returns:
[[4, 1, 5], [9]]
Implementation of @lennik's idea using python's underrated priority queue module heapq
. This follows his idea pretty much exactly, except that each list is given a first element that contains its sum. Since lists are sorted lexicography and heapq
is a min-heap implementation, all we have to do is pop off the first elements after we finish.
Using heapreplace
will help avoid unnecessary resizing operations during the updates.
from heapq import heapreplace
def sublist_creator(x, n, sort=True):
bins = [[0] for _ in range(n)]
if sort:
x = sorted(x)
for i in x:
least = bins[0]
least[0] += i
least.append(i)
heapreplace(bins, least)
return [x[1:] for x in bins]
Given M = len(x)
and N = n
, the sort is O(M log M)
and the loop does M
insertions, which are O(log N)
worst case. So for M >= N
, we can say that asymptotically the algorithm is O(M log M)
. If the array is pre-sorted, it's O(M log N)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With