Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split list into N sublists with approximately equal sums

I have a list of integers, and I need to split it into a given number of sublists (with no restrictions on order or the number of elements in each), in a way that minimizes the average difference in sums of each sublist.

For example:

>>> x = [4, 9, 1, 5]
>>> sublist_creator(x, 2)
[[9], [4, 1, 5]]

because list(map(sum, sublist_creator(x, 2))) yields [9, 10], minimizing the average distance. Alternatively, [[9, 1], [4, 5]] would have been equally correct, and my use case has no preference between two possibilities.

The only way I can think of to do this is by checking, iteratively, all possible combinations, but I'm working with a list of ~5000 elements and need to split it into ~30 sublists, so that approach is prohibitively expensive.

like image 523
Tim Avatar asked May 07 '20 01:05

Tim


People also ask

How do you divide a list into equal Sublists in Python?

Split List Into Sublists Using the array_split() Function in NumPy. The array_split() method in the NumPy library can also split a large array into multiple small arrays. This function takes the original array and the number of chunks we need to split the array into and returns the split chunks.

How do you divide a list into N equal parts?

To split a list into N parts of approximately equal length with Python, we can use list comprehension. We define the chunkify function to split the lst list into n chunks. To do this, we use list comprehension to return slices of list with from index i to the end with n items in each chunk.

Which sort divide the list into two smaller subList of approximately equal size?

linspace method. Just specify the number of parts you want the array to be divided in to. The divisions will be of nearly equal size.


3 Answers

Here's the outline:

  1. create N empty lists
  2. sort() your input array in ascending order
  3. pop() the last element from the sorted array
  4. append() the popped element to the list with the lowest sum() of the elements
  5. repeat 3 and 4 until input array is empty
  6. profit!!!

With M=5000 elements and N=30 lists this approach might take about O(N*M) if you carefully store the intermediate sums of the sublists instead of calculating them from the scratch every time.

like image 167
lenik Avatar answered Nov 14 '22 21:11

lenik


@lenik's solution has the right idea, but can use a heap queue that keeps track of the total of each sub-list and its index in sorted order to improve the cost of finding the sub-list of the minimum size to O(log n), resulting in an overall O(m x log n) time complexity:

import heapq

def sublist_creator(lst, n):
    lists = [[] for _ in range(n)]
    totals = [(0, i) for i in range(n)]
    heapq.heapify(totals)
    for value in lst:
        total, index = heapq.heappop(totals)
        lists[index].append(value)
        heapq.heappush(totals, (total + value, index))
    return lists

so that:

sublist_creator(x, 2)

returns:

[[4, 1, 5], [9]]
like image 28
blhsing Avatar answered Nov 14 '22 22:11

blhsing


Implementation of @lennik's idea using python's underrated priority queue module heapq. This follows his idea pretty much exactly, except that each list is given a first element that contains its sum. Since lists are sorted lexicography and heapq is a min-heap implementation, all we have to do is pop off the first elements after we finish.

Using heapreplace will help avoid unnecessary resizing operations during the updates.

from heapq import heapreplace

def sublist_creator(x, n, sort=True):
    bins = [[0] for _ in range(n)]
    if sort:
        x = sorted(x)
    for i in x:
        least = bins[0]
        least[0] += i
        least.append(i)
        heapreplace(bins, least)
    return [x[1:] for x in bins]

Given M = len(x) and N = n, the sort is O(M log M) and the loop does M insertions, which are O(log N) worst case. So for M >= N, we can say that asymptotically the algorithm is O(M log M). If the array is pre-sorted, it's O(M log N).

like image 35
Mad Physicist Avatar answered Nov 14 '22 23:11

Mad Physicist