Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient enumeration of non-negative integer composition

I would like to write a function my_func(n,l) that, for some positive integer n, efficiently enumerates the ordered non-negative integer composition* of length l (where l is greater than n). For example, I want my_func(2,3) to return [[0,0,2],[0,2,0],[2,0,0],[1,1,0],[1,0,1],[0,1,1]].

My initial idea was to use existing code for positive integer partitions (e.g. accel_asc() from this post), extend the positive integer partitions by a couple zeros and return all permutations.

def my_func(n, l):
    for ip in accel_asc(n):
        nic = numpy.zeros(l, dtype=int)
        nic[:len(ip)] = ip
        for p in itertools.permutations(nic):
            yield p

The output of this function is wrong, because every non-negative integer composition in which a number appears twice (or multiple times) appears several times in the output of my_func. For example, list(my_func(2,3)) returns [(1, 1, 0), (1, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (0, 1, 1), (2, 0, 0), (2, 0, 0), (0, 2, 0), (0, 0, 2), (0, 2, 0), (0, 0, 2)].

I could correct this by generating a list of all non-negative integer compositions, removing repeated entries, and then returning a remaining list (instead of a generator). But this seems incredibly inefficient and will likely run into memory issues. What is a better way to fix this?

EDIT

I did a quick comparison of the solutions offered in answers to this post and to another post that cglacet has pointed out in the comments. enter image description here

On the left, we have the l=2*n and on the right we have l=n+1. In these two cases, user2357112's second solutions is faster than the others, when n<=5. For n>5, solutions proposed by user2357112, Nathan Verzemnieks, and AndyP are more or less tied. But the conclusions could be different when considering other relationships between l and n.

..........

*I originally asked for non-negative integer partitions. Joseph Wood correctly pointed out that I am in fact looking for integer compositions, because the order of numbers in a sequence matters to me.

like image 669
Alice Schwarze Avatar asked Jan 26 '23 10:01

Alice Schwarze


2 Answers

Use the stars and bars concept: pick positions to place l-1 bars between n stars, and count how many stars end up in each section:

import itertools

def diff(seq):
    return [seq[i+1] - seq[i] for i in range(len(seq)-1)]

def generator(n, l):
    for combination in itertools.combinations_with_replacement(range(n+1), l-1):
        yield [combination[0]] + diff(combination) + [n-combination[-1]]

I've used combinations_with_replacement instead of combinations here, so the index handling is a bit different from what you'd need with combinations. The code with combinations would more closely match a standard treatment of stars and bars.


Alternatively, a different way to use combinations_with_replacement: start with a list of l zeros, pick n positions with replacement from l possible positions, and add 1 to each of the chosen positions to produce an output:

def generator2(n, l):
    for combination in itertools.combinations_with_replacement(range(l), n):
        output = [0]*l
        for i in combination:
            output[i] += 1
        yield output
like image 120
user2357112 supports Monica Avatar answered Feb 15 '23 03:02

user2357112 supports Monica


Starting from a simple recursive solution, which has the same problem as yours:

def nn_partitions(n, l):
    if n == 0:
        yield [0] * l
    else:
        for part in nn_partitions(n - 1, l):
            for i in range(l):
                new = list(part)
                new[i] += 1
                yield new

That is, for each partition for the next lower number, for each place in that partition, add 1 to the element in that place. It yields the same duplicates yours does. I remembered a trick for a similar problem, though: when you alter a partition p for n into one for n+1, fix all the elements of p to the left of the element you increase. That is, keep track of where p was modified, and never modify any of p's "descendants" to the left of that. Here's the code for that:

def _nn_partitions(n, l):
    if n == 0:
        yield [0] * l, 0
    else:
        for part, start in _nn_partitions(n - 1, l):
            for i in range(start, l):
                new = list(part)
                new[i] += 1
                yield new, i

def nn_partitions(n, l):
    for part, _ in _nn_partitions(n, l):
        yield part

It's very similar - there's just the extra parameter passed along at each step, so I added wrapper to remove that for the caller.

I haven't tested it extensively, but this appears to be reasonably fast - about 35 microseconds for nn_partitions(3, 5) and about 18s for nn_partitions(10, 20) (which yields just over 20 million partitions). (The very elegant solution from user2357112 takes about twice as long for the smaller case and about four times as long for the larger one. Edit: this refers to the first solution from that answer; the second one is faster than mine under some circumstances and slower under others.)

like image 40
Nathan Vērzemnieks Avatar answered Feb 15 '23 02:02

Nathan Vērzemnieks