Conditional sampling of binary vectors (?)

Question

I'm trying to find a name for my problem, so I don't have to re-invent wheel when coding an algorithm which solves it...

I have say 2,000 binary (row) vectors and I need to pick 500 from them. In the picked sample I do column sums and I want my sample to be as close as possible to a pre-defined distribution of the column sums. I'll be working with 20 to 60 columns.

A tiny example:

Out of the vectors:

I need to pick 2 to get column sums 2, 1, 0. The solution (exact in this case) would be

110
100

My ideas so far

one could maybe call this a binary multidimensional knapsack, but I did not find any algos for that
Linear Programming could help, but I'd need some step by step explanation as I got no experience with it
as exact solution is not always feasible, something like simulated annealing brute force could work well
a hacky way using constraint solvers comes to mind - first set the constraints tight and gradually loosen them until some solution is found - given that CSP should be much faster than ILP...?

David Eisenstat · Accepted Answer

My concrete, practical (if the approximation guarantee works out for you) suggestion would be to apply the maximum entropy method (in Chapter 7 of Boyd and Vandenberghe's book Convex Optimization; you can probably find several implementations with your favorite search engine) to find the maximum entropy probability distribution on row indexes such that (1) no row index is more likely than 1/500 (2) the expected value of the row vector chosen is 1/500th of the predefined distribution. Given this distribution, choose each row independently with probability 500 times its distribution likelihood, which will give you 500 rows on average. If you need exactly 500, repeat until you get exactly 500 (shouldn't take too many tries due to concentration bounds).

Conditional sampling of binary vectors (?)

Tags:

algorithm

random

My ideas so far

liborm

1 Answers

David Eisenstat

Recent Activity

Donate For Us

Conditional sampling of binary vectors (?)

Tags:

algorithm

random

My ideas so far

liborm

1 Answers

David Eisenstat

Related questions

Recent Activity

Donate For Us