Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up code to solve bit deletion puzzle

[This is related to Minimum set cover ]

I would like to solve the following puzzle by computer for small size of n. Consider all 2^n binary vectors of length n. For each one you delete exactly n/3 of the bits, leaving a binary vector length 2n/3 (assume n is an integer multiple of 3). The goal is to choose the bits you delete so as to minimize the number of different binary vectors of length 2n/3 that remain at the end.

For example, for n = 3 the optimal answer is 2 different vectors 11 and 00. For n = 6 it is 4, for n = 9 it is 6 and for n = 12 it is 10.

I had previously attempted to solve this problem as a minimum set cover problem of the following sort. All the lists contain only 1s and 0s.

I say that a list A covers a list B if you can make B from A by inserting exactly x symbols.

Consider all 2^n lists of 1s and 0s of length n and set x = n/3. I would like to compute a minimal set of lists of length 2n/3 that covers them all. David Eisenstat provided code that converted this minimal set cover problem into a Mixed Integer Programming Problem that could be fed into CPLEX (or http://scip.zib.de/ which is open source).

from collections import defaultdict
from itertools import product, combinations

def all_fill(source, num):
    output_len = (len(source) + num)
    for where in combinations(range(output_len), len(source)):
        poss = ([[0, 1]] * output_len)
        for (w, s) in zip(where, source):
            poss[w] = [s]
        for tup in product(*poss):
            (yield tup)

def variable_name(seq):
    return ('x' + ''.join((str(s) for s in seq)))
n = 12
shortn = ((2 * n) // 3)
x = (n // 3)
all_seqs = list(product([0, 1], repeat=shortn))
hit_sets = defaultdict(set)
for seq in all_seqs:
    for fill in all_fill(seq, x):
        hit_sets[fill].add(seq)
print('Minimize')
print(' + '.join((variable_name(seq) for seq in all_seqs)))
print('Subject To')
for (fill, seqs) in hit_sets.items():
    print(' + '.join((variable_name(seq) for seq in seqs)), '>=', 1)
print('Binary')
for seq in all_seqs:
    print(variable_name(seq))
print('End')

The problem is that if you set n=15 then the instance it outputs is too large for any solver I can find. Is there a more efficient way of solving this problem so I can solve n=15 or even n = 18?

like image 728
graffe Avatar asked Oct 11 '13 09:10

graffe


1 Answers

This doesn't solve your problem (well, not quickly enough), but you're not getting many ideas and someone else may find something useful to build on here.

It's a short pure Python 3 program, using backtracking search with some greedy ordering heuristics. It solves the N = 3, 6, and 9 instances very quickly. It finds a cover of size 10 for N=12 quickly too, but will apparently take a much longer time to exhaust the search space (I'm out of time for this, and it's still running). For N=15, the initialization time is already slow.

Bitstrings are represented by plain N-bit integers here, so consume little storage. That's to ease recoding in a faster language. It does make heavy use of sets of integers, but no other "advanced" data structures.

Hope this helps someone! But it's clear that the combinatorial explosion of possibilities as N increases ensures that nothing will be "fast enough" without digging deeper into the mathematics of the problem.

def dump(cover):
    for s in sorted(cover):
        print("    {:0{width}b}".format(s, width=I))

def new_best(cover):
    global best_cover, best_size
    assert len(cover) < best_size
    best_size = len(cover)
    best_cover = cover.copy()
    print("N =", N, "new best cover, size", best_size)
    dump(best_cover)

def initialize(N, X, I):
    from itertools import combinations
    # Map a "wide" (length N) bitstring to the set of all
    # "narrow" (length I) bitstrings that generate it.
    w2n = [set() for _ in range(2**N)]
    # Map a narrow bitstring to all the wide bitstrings
    # it generates.
    n2w = [set() for _ in range(2**I)]
    for wide, wset in enumerate(w2n):
        for t in combinations(range(N), X):
            narrow = wide
            for i in reversed(t):  # largest i to smallest
                hi, lo = divmod(narrow, 1 << i)
                narrow = ((hi >> 1) << i) | lo
            wset.add(narrow)
            n2w[narrow].add(wide)
    return w2n, n2w

def solve(needed, cover):
    if len(cover) >= best_size:
        return
    if not needed:
        new_best(cover)
        return
    # Find something needed with minimal generating set.
    _, winner = min((len(w2n[g]), g) for g in needed)
    # And order its generators by how much reduction they make
    # to `needed`.
    for g in sorted(w2n[winner],
                    key=lambda g: len(needed & n2w[g]),
                    reverse=True):
        cover.add(g)
        solve(needed - n2w[g], cover)
        cover.remove(g)

N = 9  # CHANGE THIS TO WHAT YOU WANT

assert N % 3 == 0
X = N // 3 # number of bits to exclude
I = N - X  # number of bits to include

print("initializing")
w2n, n2w = initialize(N, X, I)
best_cover = None
best_size = 2**I + 1  # "infinity"
print("solving")
solve(set(range(2**N)), set())

Example output for N=9:

initializing
solving
N = 9 new best cover, size 6
    000000
    000111
    001100
    110011
    111000
    111111

Followup

For N=12 this eventually finished, confirming that the minimal covering set contains 10 elements (which it found very soon at the start). I didn't time it, but it took at least 5 hours.

Why's that? Because it's close to brain-dead ;-) A completely naive search would try all subsets of the 256 8-bit short strings. There are 2**256 such subsets, about 1.2e77 - it wouldn't finish in the expected lifetime of the universe ;-)

The ordering gimmicks here first detect that the "all 0" and "all 1" short strings must be in any covering set, so pick them. That leaves us looking at "only" the 254 remaining short strings. Then the greedy "pick an element that covers the most" strategy very quickly finds a covering set with 11 total elements, and shortly thereafter a covering with 10 elements. That happens to be optimal, but it takes a long time to exhaust all other possibilities.

At this point, any attempt at a covering set that reaches 10 elements is aborted (it can't possibly be smaller than 10 elements then!). If that were done wholly naively too, it would need to try adding (to the "all 0" and "all 1" strings) all 8-element subsets of the 254 remaining, and 254-choose-8 is about 3.8e14. Very much smaller than 1.2e77 - but still way too large to be practical. It's an interesting exercise to understand how the code manages to do so much better than that. Hint: it has a lot to do with the data in this problem.

Industrial-strength solvers are incomparably more sophisticated and complex. I was pleasantly surprised at how well this simple little program did on the smaller problem instances! It got lucky.

But for N=15 this simple approach is hopeless. It quickly finds a cover with 18 elements, but makes no more visible progress for at least hours. Internally, it's still working with needed sets containing hundreds (even thousands) of elements, which makes the body of solve() quite expensive. It still has 2**10 - 2 = 1022 short strings to consider, and 1022-choose-16 is about 6e34. I don't expect it would visibly help even if this code were sped by a factor of a million.

It was fun to try, though :-)

And a small rewrite

This version runs at least 6 times faster on a full N=12 run, simply by cutting off futile searches one level earlier. Also speeds initialization, and cuts memory use by changing the 2**N w2n sets into lists (no set operations are used on those). It's still hopeless for N=15, though :-(

def dump(cover):
    for s in sorted(cover):
        print("    {:0{width}b}".format(s, width=I))

def new_best(cover):
    global best_cover, best_size
    assert len(cover) < best_size
    best_size = len(cover)
    best_cover = cover.copy()
    print("N =", N, "new best cover, size", best_size)
    dump(best_cover)

def initialize(N, X, I):
    from itertools import combinations
    # Map a "wide" (length N) bitstring to the set of all
    # "narrow" (length I) bitstrings that generate it.
    w2n = [set() for _ in range(2**N)]
    # Map a narrow bitstring to all the wide bitstrings
    # it generates.
    n2w = [set() for _ in range(2**I)]
    # mask[i] is a string of i 1-bits
    mask = [2**i - 1 for i in range(N)]
    for t in combinations(range(N), X):
        t = t[::-1]  # largest i to smallest
        for wide, wset in enumerate(w2n):
            narrow = wide
            for i in t:  # delete bit 2**i
                narrow = ((narrow >> (i+1)) << i) | (narrow & mask[i])
            wset.add(narrow)
            n2w[narrow].add(wide)
    # release some space
    for i, s in enumerate(w2n):
        w2n[i] = list(s)
    return w2n, n2w

def solve(needed, cover):
    if not needed:
        if len(cover) < best_size:
            new_best(cover)
        return
    if len(cover) >= best_size - 1:
        # can't possibly be extended to a cover < best_size
        return
    # Find something needed with minimal generating set.
    _, winner = min((len(w2n[g]), g) for g in needed)
    # And order its generators by how much reduction they make
    # to `needed`.
    for g in sorted(w2n[winner],
                    key=lambda g: len(needed & n2w[g]),
                    reverse=True):
        cover.add(g)
        solve(needed - n2w[g], cover)
        cover.remove(g)

N = 9  # CHANGE THIS TO WHAT YOU WANT

assert N % 3 == 0
X = N // 3 # number of bits to exclude
I = N - X  # number of bits to include

print("initializing")
w2n, n2w = initialize(N, X, I)

best_cover = None
best_size = 2**I + 1  # "infinity"
print("solving")
solve(set(range(2**N)), set())

print("best for N =", N, "has size", best_size)
dump(best_cover)
like image 143
Tim Peters Avatar answered Nov 15 '22 15:11

Tim Peters