I have a list of integers, and I need to split it into a given number of sublists (with no restrictions on order or the number of elements in each), in a way that minimizes the average difference in sums of each sublist. For example: <pre class="prettyprint"><code>>>> x = [4, 9, 1, 5] >>> sublist_creator(x, 2) [[9], [4, 1, 5]] </code></pre> because <code>list(map(sum, sublist_creator(x, 2)))</code> yields <code>[9, 10]</code>, minimizing the average distance. Alternatively, <code>[[9, 1], [4, 5]]</code> would have been equally correct, and my use case has no preference between two possibilities. The only way I can think of to do this is by checking, iteratively, all possible combinations, but I'm working with a list of ~5000 elements and need to split it into ~30 sublists, so that approach is prohibitively expensive.

Here's the outline: <ol> <li>create <code>N</code> empty lists</li> <li> <code>sort()</code> your input array in ascending order</li> <li> <code>pop()</code> the last element from the sorted array</li> <li> <code>append()</code> the popped element to the list with the lowest <code>sum()</code> of the elements</li> <li>repeat 3 and 4 until input array is empty</li> <li>profit!!!</li> </ol> With M=5000 elements and N=30 lists this approach might take about O(N*M) if you carefully store the intermediate sums of the sublists instead of calculating them from the scratch every time.

Implementation of @lennik's idea using python's underrated priority queue module <code>heapq</code>. This follows his idea pretty much exactly, except that each list is given a first element that contains its sum. Since lists are sorted lexicography and <code>heapq</code> is a min-heap implementation, all we have to do is pop off the first elements after we finish. Using <code>heapreplace</code> will help avoid unnecessary resizing operations during the updates. <pre class="prettyprint"><code>from heapq import heapreplace def sublist_creator(x, n, sort=True): bins = [[0] for _ in range(n)] if sort: x = sorted(x) for i in x: least = bins[0] least[0] += i least.append(i) heapreplace(bins, least) return [x[1:] for x in bins] </code></pre> Given <code>M = len(x)</code> and <code>N = n</code>, the sort is <code>O(M log M)</code> and the loop does <code>M</code> insertions, which are <code>O(log N)</code> worst case. So for <code>M >= N</code>, we can say that asymptotically the algorithm is <code>O(M log M)</code>. If the array is pre-sorted, it's <code>O(M log N)</code>.

Split list into N sublists with approximately equal sums

I have a list of integers, and I need to split it into a given number of sublists (with no restrictions on order or the number of elements in each), in a way that minimizes the average difference in sums of each sublist.

For example:

>>> x = [4, 9, 1, 5]
>>> sublist_creator(x, 2)
[[9], [4, 1, 5]]

because list(map(sum, sublist_creator(x, 2))) yields [9, 10], minimizing the average distance. Alternatively, [[9, 1], [4, 5]] would have been equally correct, and my use case has no preference between two possibilities.

The only way I can think of to do this is by checking, iteratively, all possible combinations, but I'm working with a list of ~5000 elements and need to split it into ~30 sublists, so that approach is prohibitively expensive.

How do you divide a list into equal Sublists in Python?

Split List Into Sublists Using the array_split() Function in NumPy. The array_split() method in the NumPy library can also split a large array into multiple small arrays. This function takes the original array and the number of chunks we need to split the array into and returns the split chunks.

How do you divide a list into N equal parts?

To split a list into N parts of approximately equal length with Python, we can use list comprehension. We define the chunkify function to split the lst list into n chunks. To do this, we use list comprehension to return slices of list with from index i to the end with n items in each chunk.

Which sort divide the list into two smaller subList of approximately equal size?

linspace method. Just specify the number of parts you want the array to be divided in to. The divisions will be of nearly equal size.

Here's the outline:

create N empty lists
sort() your input array in ascending order
pop() the last element from the sorted array
append() the popped element to the list with the lowest sum() of the elements
repeat 3 and 4 until input array is empty
profit!!!

With M=5000 elements and N=30 lists this approach might take about O(N*M) if you carefully store the intermediate sums of the sublists instead of calculating them from the scratch every time.

@lenik's solution has the right idea, but can use a heap queue that keeps track of the total of each sub-list and its index in sorted order to improve the cost of finding the sub-list of the minimum size to O(log n), resulting in an overall O(m x log n) time complexity:

import heapq

def sublist_creator(lst, n):
    lists = [[] for _ in range(n)]
    totals = [(0, i) for i in range(n)]
    heapq.heapify(totals)
    for value in lst:
        total, index = heapq.heappop(totals)
        lists[index].append(value)
        heapq.heappush(totals, (total + value, index))
    return lists

so that:

sublist_creator(x, 2)

returns:

[[4, 1, 5], [9]]

Implementation of @lennik's idea using python's underrated priority queue module heapq. This follows his idea pretty much exactly, except that each list is given a first element that contains its sum. Since lists are sorted lexicography and heapq is a min-heap implementation, all we have to do is pop off the first elements after we finish.

Using heapreplace will help avoid unnecessary resizing operations during the updates.

from heapq import heapreplace

def sublist_creator(x, n, sort=True):
    bins = [[0] for _ in range(n)]
    if sort:
        x = sorted(x)
    for i in x:
        least = bins[0]
        least[0] += i
        least.append(i)
        heapreplace(bins, least)
    return [x[1:] for x in bins]

Given M = len(x) and N = n, the sort is O(M log M) and the loop does M insertions, which are O(log N) worst case. So for M >= N, we can say that asymptotically the algorithm is O(M log M). If the array is pre-sorted, it's O(M log N).

Split list into N sublists with approximately equal sums

Tags:

python

python-3.x

Tim

People also ask

3 Answers

lenik

blhsing

Mad Physicist

Recent Activity

Donate For Us

Split list into N sublists with approximately equal sums

Tags:

python

python-3.x

Tim

People also ask

3 Answers

lenik

blhsing

Mad Physicist

Related questions

Recent Activity

Donate For Us