Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Finding random k-subset partition for a given list

The following code generates all partitions of length k (k-subset partitions) for a given list. the algorithm could be found in this topic.

def algorithm_u(ns, m):
    def visit(n, a):
        ps = [[] for i in xrange(m)]
        for j in xrange(n):
            ps[a[j + 1]].append(ns[j])
        return ps

    def f(mu, nu, sigma, n, a):
        if mu == 2:
            yield visit(n, a)
        else:
            for v in f(mu - 1, nu - 1, (mu + sigma) % 2, n, a):
                yield v
        if nu == mu + 1:
            a[mu] = mu - 1
            yield visit(n, a)
            while a[nu] > 0:
                a[nu] = a[nu] - 1
                yield visit(n, a)
        elif nu > mu + 1:
            if (mu + sigma) % 2 == 1:
                a[nu - 1] = mu - 1
            else:
                a[mu] = mu - 1
            if (a[nu] + sigma) % 2 == 1:
                for v in b(mu, nu - 1, 0, n, a):
                    yield v
            else:
                for v in f(mu, nu - 1, 0, n, a):
                    yield v
            while a[nu] > 0:
                a[nu] = a[nu] - 1
                if (a[nu] + sigma) % 2 == 1:
                    for v in b(mu, nu - 1, 0, n, a):
                        yield v
                else:
                    for v in f(mu, nu - 1, 0, n, a):
                        yield v

    def b(mu, nu, sigma, n, a):
        if nu == mu + 1:
            while a[nu] < mu - 1:
                yield visit(n, a)
                a[nu] = a[nu] + 1
            yield visit(n, a)
            a[mu] = 0
        elif nu > mu + 1:
            if (a[nu] + sigma) % 2 == 1:
                for v in f(mu, nu - 1, 0, n, a):
                    yield v
            else:
                for v in b(mu, nu - 1, 0, n, a):
                    yield v
            while a[nu] < mu - 1:
                a[nu] = a[nu] + 1
                if (a[nu] + sigma) % 2 == 1:
                    for v in f(mu, nu - 1, 0, n, a):
                        yield v
                else:
                    for v in b(mu, nu - 1, 0, n, a):
                        yield v
            if (mu + sigma) % 2 == 1:
                a[nu - 1] = 0
            else:
                a[mu] = 0
        if mu == 2:
            yield visit(n, a)
        else:
            for v in b(mu - 1, nu - 1, (mu + sigma) % 2, n, a):
                yield v

    n = len(ns)
    a = [0] * (n + 1)
    for j in xrange(1, m + 1):
        a[n - m + j] = j - 1
    return f(m, n, 0, n, a)

we know that number of k-subsets of a given list is equal to Stirling number and it could be very big for some large lists.

the code above returns a Python generator that could generate all possible k-subset partitions for the given list with calling its next method. accordingly, if I want to get only one of these partitions randomly, I have to either call next method for some random times (which makes it really slow if the Stirling number is big) or use the itertools.islice method to get a slice of size one which is really slow as before.

I'm trying to avoid listing all partitions because it would be waste of time and speed and even memory (because calculations are a lot and memory is important in my case).

the question is how can I generate only one of k-subset partitions without generating the rest ? or at least make the procedure very faster than what explained above. I need the performance because I need to get only one of them each time and I'm running the application for maybe more than ten million times.

I'd appreciate any help.

EDIT: EXAMPLE

list : { 1, 2, 3 }

for k = 3:

{ {1}, {2}, {3} }

for k = 2:

{ {1, 2}, {3} }
{ {1, 3}, {2} }
{ {1}, {2, 3} }

and for k = 1:

{ {1, 2, 3} }

consider k = 2, is there any way I can generate only one of these 3 partitions randomly, without generating the other 2? note that I want to generate random partition for any given k not only a random partition of any k which means if I set the k to 2 I would like to generate only one of these 3 not one of all 5.

Regards,

Mohammad

like image 235
Mohammad Siavashi Avatar asked Aug 23 '17 02:08

Mohammad Siavashi


1 Answers

You can count Stirling numbers efficiently with a recursive algorithm by storing previously computed values:

fact=[1]

def nCr(n,k):
    """Return number of ways of choosing k elements from n"""
    while len(fact)<=n:
        fact.append(fact[-1]*len(fact))
    return fact[n]/(fact[k]*fact[n-k])

cache = {}
def count_part(n,k):
    """Return number of ways of partitioning n items into k non-empty subsets"""
    if k==1:
        return 1
    key = n,k
    if key in cache:
        return cache[key]
    # The first element goes into the next partition
    # We can have up to y additional elements from the n-1 remaining
    # There will be n-1-y left over to partition into k-1 non-empty subsets
    # so n-1-y>=k-1
    # y<=n-k
    t = 0
    for y in range(0,n-k+1):
        t += count_part(n-1-y,k-1) * nCr(n-1,y)
    cache[key] = t
    return t   

Once you know how many choices there are, you can adapt this recursive code to generate a particular partition:

def ith_subset(A,k,i):
    """Return ith k-subset of A"""
    # Choose first element x
    n = len(A)
    if n==k:
        return A
    if k==0:
        return []
    for x in range(n):
        # Find how many cases are possible with the first element being x
        # There will be n-x-1 left over, from which we choose k-1
        extra = nCr(n-x-1,k-1)
        if i<extra:
            break
        i -= extra
    return [A[x]] + ith_subset(A[x+1:],k-1,i)

def gen_part(A,k,i):
    """Return i^th k-partition of elements in A (zero-indexed) as list of lists"""
    if k==1:
        return [A]
    n=len(A)
    # First find appropriate value for y - the extra amount in this subset
    for y in range(0,n-k+1):
        extra = count_part(n-1-y,k-1) * nCr(n-1,y)
        if i<extra:
            break
        i -= extra
    # We count through the subsets, and for each subset we count through the partitions
    # Split i into a count for subsets and a count for the remaining partitions
    count_partition,count_subset = divmod(i,nCr(n-1,y))
    # Now find the i^th appropriate subset
    subset = [A[0]] + ith_subset(A[1:],y,count_subset)
    S=set(subset)
    return  [subset] + gen_part([a for a in A if a not in S],k-1,count_partition)

As an example, I've written a test program that produces different partitions of 4 numbers:

def test(A):
    n=len(A)
    for k in [1,2,3,4]:
        t = count_part(n,k)
        print k,t
        for i in range(t):
            print " ",i,gen_part(A,k,i)

test([1,2,3,4])

This code prints:

1 1
  0 [[1, 2, 3, 4]]
2 7
  0 [[1], [2, 3, 4]]
  1 [[1, 2], [3, 4]]
  2 [[1, 3], [2, 4]]
  3 [[1, 4], [2, 3]]
  4 [[1, 2, 3], [4]]
  5 [[1, 2, 4], [3]]
  6 [[1, 3, 4], [2]]
3 6
  0 [[1], [2], [3, 4]]
  1 [[1], [2, 3], [4]]
  2 [[1], [2, 4], [3]]
  3 [[1, 2], [3], [4]]
  4 [[1, 3], [2], [4]]
  5 [[1, 4], [2], [3]]
4 1
  0 [[1], [2], [3], [4]]

As another example, there are 10 million partitions of 1,2,3,..14 into 4 parts. This code can generate all partitions in 44 seconds with pypy.

There are 50,369,882,873,307,917,364,901 partitions of 1,2,3,...,40 into 4 parts. This code can generate 10 million of these in 120 seconds with pypy running on a single processor.

To tie things together, you can use this code to generate a single random partition of a list A into k non-empty subsets:

import random
def random_ksubset(A,k):
    i = random.randrange(0,count_part(len(A),k))
    return gen_part(A,k,i)
like image 124
Peter de Rivaz Avatar answered Nov 01 '22 10:11

Peter de Rivaz