Greedy algorithm to split a list of lists of numbers into two partitions with same amount of each number in Python

Question

I have a list of sublists of random positive integers. This list is controlled by 3 parameters:

max_num: the maximum integer allowed in each sublist, e.g. if max_num = 3, the list will look like [[1,3], [3], [1,2,3], [1], ...];
max_length: the maximum number of intergers in each sublist;
n_gen: the number of sublists generated, i.e., the length of the list.

You can generate such list using the following code

import random

random.seed(10)
def random_numbers(length, max_num):
    return [random.randint(1, max_num) for _ in range(length)]

max_num = 3
max_length = 3 # I want max_length=10
n_gen = 10 # I want n_gen=200

lst = [random_numbers(random.randint(1, max_length), max_num) for _ in range(n_gen)]

Now I want to split the list into two partitions, each partition has the same amount of each number. For example, if lst = [[1,2,3], [2,3], [1,3], [3]], one of the solution would be bipartition = [[[1,2,3], [3]], [[2,3], [1,3]]].

I managed to write the following brute-force enumeration for all possible bipartitions, which works fine for small parameters.

from itertools import product

lst1 = []
lst2 = []
for pattern in product([True, False], repeat=len(lst)):
    lst1.append([x[1] for x in zip(pattern, lst) if x[0]])
    lst2.append([x[1] for x in zip(pattern, lst) if not x[0]])

bipartitions = []
for l1, l2 in zip(lst1, lst2):
    flat1 = [i for l in l1 for i in l]
    flat2 = [i for l in l2 for i in l]
    if sorted(flat1) == sorted(flat2):
        bipartitions.append([l1, l2])

for bipartition in bipartitions:
    print(bipartition)

Output:

[[[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]], [[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]], [[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]], [[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]]]
[[[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]], [[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]]]
[[[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]], [[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]]]
[[[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]], [[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]]]
[[[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]], [[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]]]
[[[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]], [[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]]]
[[[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]], [[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]]]
[[[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]]]

However, when the parameters becomes larger, this becomes infeasible. Now I would like to generate random bipartitions that has same amount of each number, I guess a greedy algorithm will do. For my current task, I need to use

max_num = 3
max_length = 10
n_gen = 200

Any suggestions?

Edit: I am aware that there will be cases where such bipartition is not possible at all. My thought is when the bipartition suggested by the greedy algorithm after a maximum number of suggestions (e.g. 1000 if fast enough), we should believe there is no such bipartitions. When the parameters are large, even a check of whether such bipartition exist will be infeasible.

Peatherfed · Accepted Answer

Holy heck this one was a doozy. First off, let me state the obvious. A greedy algorithm is deterministic, since it will always choose the optimal path. Second, the odds of actually being able to bipartition something is very, very unlikely. I also suggest that if you want to generate bipartitions, trying to find them from random sets like this is not a good idea.

Anyhow, on to the code. First, let me say that the code is not pretty, nor is it completely optimized. Towards the end there I wasn't even being Pythonic in some areas, but they are all easily fixable. I've been at this for hours, but it was a fun project. The copying of the list stands out as the prime suspect. You can re-write it and optimize it in your own time. I also can't guarantee that it's bug-free, but I'm pretty sure it is. Only exception being that you need to make sure that it at least does one "careful" search if you want any results. That brings me to the next point, the algorithm itself.

We start off by doing a pretty standard greedy algorithm. We pick an index from our partitionee and, WLOG, assign it to the left bipartition. Next we look at all possible ways of inserting all remaining lists. We choose the one that brings us closest to 0. We repeat until we hit some breakpoint, after which we switch to your exhaustive algorithm.

Now, odds are we don't find a partition for high values of your constants. I believe this is just a statistical thing, and not a problem with the algorithm, but I could be wrong.

I also implemented a rough feasibility test, and you'll see quite quickly that ~90% of all randomly generated nested lists can immediately be discarded as impossible to bipartition.

However, the addition of the greedy algorithm now allows me, on my machine, to go from testing ~15 length partitions to ~30 length ones, with good success of finding one. It also runs in less than a 10th of second with e.g. 3, 3, 40, 12 as its constants.

Finally, here is the code Note that I only made it generate one partition to test, so you might need to run it a few times before you even get a feasible one:

from itertools import product
import random
import datetime
import time
import sys

MAX_NUM = 3
MAX_LEN = 3
NUM_GEN = 200
NSWITCH = 12

random.seed(time.time())

def feasible(partitionee):
    '''Does a rough test to see if it is feasible to find a partition'''
    counts = [0 for _ in range(MAX_NUM)]
    flat = [x for sublist in partitionee for x in sublist]
    for n in flat:
        counts[n-1] += 1
    for n in counts:
        if n % 2 != 0:
            return False
    return True 

def random_numbers(length, max_num, n_lists):
    '''Create a random list of lists of numbers.'''

    lst = []
    for i in range(n_lists):
        sublist_length = random.randint(1, length)
        lst.append([random.randint(1, max_num) for _ in range(sublist_length)])
    return lst


def diff(lcounts, rcounts):
    '''Calculate the difference between the counts in our dictionaries.'''

    difference = 0
    for i in range(MAX_NUM):
        difference += abs(lcounts[i] - rcounts[i])

    return difference


def assign(partition, d, sublist):
    '''Assign a sublist to a partition, and update its dictionary.'''

    partition.append(sublist)
    for n in sublist:
        d[n-1] += 1


def assign_value(d1, d2, sublist):
    '''Calculates the loss of assigning sublist.'''

    for n in sublist:
        d1[n-1] += 1
    left_score = diff(d1, d2)
    for n in sublist:
        d1[n-1] -= 1
        d2[n-1] += 1
    right_score = diff(d1, d2)
    for n in sublist:
        d2[n-1] -= 1

    return (left_score, right_score)


def greedy_partition(left, right, lcounts, rcounts, i, partitionee):
    # Assign the i:th sublist to the left partition.
    assign(left, lcounts, partitionee[i])
    del partitionee[i]

    for _ in range(NUM_GEN - NSWITCH):
        # Go through all unassigned sublists and get their loss.
        value_for_index = {}
        for i, sublist in enumerate(partitionee):
            value = assign_value(lcounts, rcounts, sublist)
            value_for_index[i]  = value

        # Find which choice would be closest to 0 difference.
        min_value    = 100000000000 # BIG NUMBER
        best_index  = -1
        choose_left = True
        for key, value in value_for_index.items():
            if min(value) < min_value:
                min_value    = min(value)
                choose_left = value[0] < value[1]
                best_index  = key

        # Assign it to the proper list.
        if choose_left:
            assign(left, lcounts, partitionee[best_index])
        else:
            assign(right, rcounts, partitionee[best_index])
        del partitionee[best_index]

    return diff(lcounts, rcounts)



# Create our list to partition.
partition_me = random_numbers(MAX_LEN, MAX_NUM, NUM_GEN)

start_time = datetime.datetime.now()

# Start by seeing if it's even feasible to partition.
if not feasible(partition_me):
    print('No bipartition possible!')
    sys.exit()


# Go through all possible starting arrangements.
min_score_seen = 100000000000 # BIG NUMBER
best_bipartition = []
for i in range(NUM_GEN):
    # Create left and right partitions, as well as maps to count how many of each
    # number each partition has accumulated.
    left  = []
    right = []
    lcounts  = [0 for i in range(MAX_NUM)]
    rcounts  = [0 for i in range(MAX_NUM)]

    # Copy partitionee since it will be consumed.
    partition = partition_me.copy()

    # Do greedy partition.
    score = greedy_partition(left, right, lcounts, rcounts, i, partition)
    if score < min_score_seen:
        min_score_seen = score
        best_bipartition = [left] + [right]

# Now that we've been greedy and fast, we will be careful and slow.
# Consider all possible remaining arrangements.
print('Done with greedy search, starting careful search.')
left = best_bipartition[0]
right = best_bipartition[1]

for pattern in product([True, False], repeat=len(partition)):
    lst1 = left  + ([x[1] for x in zip(pattern, partition) if x[0]])
    lst2 = right +([x[1] for x in zip(pattern, partition) if not x[0]])
    left_flat  = [x for sublist in lst1 for x in sublist]
    right_flat  = [x for sublist in lst2 for x in sublist]
    if sorted(left_flat) == sorted(right_flat):
        print('Found bipartition by careful search:')
        print([lst1] + [lst2])
        break

end_time = datetime.datetime.now()
print('Time taken: ', end='')
print(end_time - start_time)

Greedy algorithm to split a list of lists of numbers into two partitions with same amount of each number in Python

Tags:

python

algorithm

list

greedy

partitioning

Shaun Han

1 Answers

Peatherfed

Recent Activity

Donate For Us

Greedy algorithm to split a list of lists of numbers into two partitions with same amount of each number in Python

Tags:

python

algorithm

list

greedy

partitioning

Shaun Han

1 Answers

Peatherfed

Related questions

Recent Activity

Donate For Us