Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index into size ordered power set

I would like to be able to index elements of a power set without expanding the full set into memory (a la itertools)

Furthermore I want the index to be cardinality ordered. So index 0 should be the empty set, index 2**n - 1 should be all elements

Most literature I have found so far involves generating a power set inductively. It doesn't let you just dive in at any index. My motivation for this indexing is to slice a problem for distributed execution and it be helpful if a remote machine can just dive in anywhere without sharing an iterator reference across a cluster.

EDIT: Blckknght suggested the solution I pursued which is shown below

from scipy.misc import comb

def kcombination_to_index(combination):
    index = 0
    combination = sorted(combination)
    for k, ck in enumerate(combination):
        index += comb(ck, k+1, exact=True)
    return index

def index_to_kcombination(index, k):
    result = []
    for k in reversed(range(1, k+1)):
        n = 0
        while comb(n, k, exact=True) <= index:
            n +=1
        result.append(n-1)
        index -= comb(n-1, k, exact=True)

    return result

class PowerSet:
    def __init__(self, elements):
        self.elements = elements

    def __len__(self):
        return 2 ** len(self.elements)

    def __iter__(self):
        for i in range(len(self)):
            yield self[i]

    def __getitem__(self, k):
        if not isinstance(k, int):
            raise TypeError
        #k=0 is empty set,
        #k= 1 - 1+n returns subsets of size 1
        for subset_size in range(len(self.elements) + 1):
            number_subsets = comb(len(self.elements), subset_size, exact=True)

            if k >= number_subsets:
                k -= number_subsets
            else:
                break

        #we now want the kth element of a possible permutation of subset_size elements
        indeces = index_to_kcombination(k, subset_size)

        return map(lambda i: self.elements[i], indeces)

if __name__ == "__main__":
    print "index of combination [8, 6, 3, 1, 0] is", kcombination_to_index([8, 6, 3, 1, 0])
    print "5 combination at position 72 is", index_to_kcombination(72,5)

    ps = PowerSet(["a", "b", "c", "d"])

    for subset_idx in range(len(ps)):
        print ps[subset_idx]
like image 428
Tom Larkworthy Avatar asked Sep 23 '13 18:09

Tom Larkworthy


1 Answers

I think you can do this with a two step process. The first step is as Mihai Maruseac described in his (now deleted) answer, to find the size of the set by iterating over the possible sizes until you find the appropriate one. Here's code for that:

def find_size(n, i):
    """Return a tuple, (k, i), where s is the size of the i-1'th set in the
       cardinally-ordered powerset of {0..n-1}, and i is the remaining index
       within the combinations of that size."""
    if not 0 <= i < 2**n:
        raise ValueError('index is too large or small')
    for k in range(n+1):
        c = comb(n, k)
        if c > i:
            return k, i
        else:
            i -= c

Once you have determined the size, you can use the combinatorial number system to find the right k-combination from the lexicographical ordering:

def pick_set(n, i):
    """Return the i-1'th set in the cardinally-ordered powerset of {0..n-1}"""
    s, i = find_size(n, i)
    result = []
    for k in range(s, 0, -1):
        prev_c = 0
        for v in range(k, n+1):
            c = comb(v, k)
            if i < c:
                result.append(v-1)
                i -= prev_c
                break
            prev_c = c
    return tuple(result)

Both of those functions require a function to calculate the number of k-combinations for a set of size n, nCk (which I've called comb). This other question has several suggested solutions for finding that value, including scipy.misc.comb, gmpy.comb and a few pure-python implementations. Or, since it's called repeatedly with sequentially increasing values (e.g. comb(n, 0), comb(n, 1), etc. or comb(k, k), comb(k+1, k), etc.) you could instead use an inline calculation that takes advantage the previously calculated value to give better performance.

Example usage (using a comb function minimally adapted from J.F. Sebastian's answer in the question linked above):

>>> for i in range(2**4):
        print(i, pick_set(4, i))

0 ()
1 (0,)
2 (1,)
3 (2,)
4 (3,)
5 (1, 0)
6 (2, 0)
7 (2, 1)
8 (3, 0)
9 (3, 1)
10 (3, 2)
11 (2, 1, 0)
12 (3, 1, 0)
13 (3, 2, 0)
14 (3, 2, 1)
15 (3, 2, 1, 0)

Note that if you plan on iterating over combinations (as I did in the example), you can probably do so more efficiently than by running the full algorithm, as there are more efficient algorithms for finding the next combination of a given size (though you'll need a bit of extra logic to bump up to the next larger size of combinations when you've exhausted the initial size).

Edit: Here are implementations of some of the optimizations I mentioned briefly above:

First off, generators that efficiently calculate combination values for ranges of n or k values:

def comb_n_range(start_n, stop_n, k):
    c = comb(start_n, k)
    yield start_n, c
    for n in range(start_n+1, stop_n):
        c = c * n // (n - k)
        yield n, c

def comb_k_range(n, start_k, end_k):
    c = comb(n, start_k)
    yield start_k, c
    for k in range(start_k+1, end_k):
        c = c * (n - k + 1) // k
        yield k, c

The for ... in range(...): c = comb(...); ... bits in the code above can be adjusted to use these, which should be a bit faster.

Next, a function that returns the next combination in lexicographical order:

def next_combination(n, c):
    if c[-1] == n-len(c)+1:
        raise ValueError("no more combinations")
    for i in range(len(c)-1, -1, -1):
        if i == 0 or c[i] < c[i-1] - 1:
            return c[:i] + (c[i] + 1,) + tuple(range(len(c)-2-i,-1,-1))

And a generator that uses next_combination to yield a range of values from the powerset, defined by a slice object:

def powerset_slice(n, s):
    start, stop, step = s.indices(2**n)
    if step < 1:
        raise ValueError("invalid step size (must be positive)")

    if start == 0:
        c = ()
    else:
        c = pick_set(n, start)

    for _ in range(start, stop, step):
        yield c
        for _ in range(step):
            try:
                c = next_combination(n, c)
            except ValueError:
                if len(c) == n:
                    return
                c = tuple(range(len(c), -1, -1))

You could integrate this into the class you are using by making __getitem__ return the generator if it is passed a slice object, rather than an int. This would let you make __iter__ faster by simply turning its body into: return self[:].

like image 109
Blckknght Avatar answered Oct 23 '22 10:10

Blckknght