Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a number of maximally different binary vectors from a set

Consider the set, S, of all binary vectors of length n where each contains exactly m ones; so there are n-m zeros in each vector.
My goal is to construct a number, k, of vectors from S such that these vectors are as different as possible from each other.

As a simple example, take n=4, m=2 and k=2, then a possible solution is: [1,1,0,0] and [0,0,1,1].

It seems that this is an open problem in the coding theory literature (?).

Is there any way (i.e. algorithm) to find a suboptimal yet good solution ?
Is Hamming distance the right performance measure to use in this case ?

Some thoughts:
In this paper, the authors propose a couple of algorithms to find the subset of vectors such that the pairwise Hamming distance is >= a certain value, d.
I have implemented the Random approach as follows: take a set SS, which is initialized by any vector from S. Then, I consider the remaining vectors in S. For each of these vectors, I check if this vector has at least a distance d with respect to each vector in SS. If so, then it is added to SS.
By taking the maximal possible d, if the size of SS is >= k, then I consider SS as an optimal solution, and I choose any subset of k vectors from SS. Using this approach, I think that the resulting SS will depend on the identity of the initial vector in SS; i.e. there are multiple solutions(?).
But how to proceed if the size of SS is < k ?
From the proposed algorithms in the paper, I have only understood the Random one. I am interested in the Binary lexicographic search (section 2.3) but I don't know how to implement it (?).

like image 568
din Avatar asked May 11 '18 12:05

din


1 Answers

Maybe you find this paper useful (I wrote it). It contains algorithms that efficiently create permutations of bitstrings.

For example, the inc() algorithm:

long  inc(long h_in , long m0 , long m1) {
    long  h_out = h_in | (~m1); //pre -mask
    h_out ++;
    // increment
    h_out = (h_out & m1) | m0; //post -mask
    return  h_out;
}

It takes an input h_in and return the next higher value that is at least 1 larger than h_in and 'matches' the boundaries m0 and m1. 'Matching' means: the result has a 1 whereever m0 has a 1, and the result has a 0 whereever m1 has a 0. Not that h_in MUST BE a valid value with regards to mo and m1! Also, note that m0 has to be bitwise smaller than m1, which means that m0 cannot have a 1 in a position where m1 has a 0.

This could be used to generate permutations with a minimum edit distance to a given input string:

Let's assume you have 0110, you first NEGATE it to 1001 (edit distance = k). Set 'm0=1001' and 'm1=1001'. Using this would result only on '1001' itself.

Now to get all values with edit distance k-1, you can do the following, simply flip one of the bits of m0 or m1, then inc() will return an ordered series of all bitstring that have a difference of k or k-1.

I know, not very interesting yet, but you can modify up to k bits, and inc() will always return all permutations with the maximum allowed edit difference with regard to m0 and m1.

Now, to get all permutations, you would have to re-run the algorithm with all possibly combinations of m0 and m1.

Example: To get all possible permutations of 0110 with edit distance 2, you would have to run inc() with the following permutations of m0=0110 and m1=0110 (to get permutations, a bit position has to be expanded, meaning that m0 is set to 0 and m1 is set to 1:

  • Bit 0 and 1 expanded: m0=0010 and m1=1110
  • Bit 0 and 2 expanded: m0=0100 and m1=1110
  • Bit 0 and 3 expanded: m0=0110 and m1=1111
  • Bit 1 and 2 expanded: m0=0000 and m1=0110
  • Bit 1 and 3 expanded: m0=0010 and m1=0111
  • Bit 2 and 3 expanded: m0=0100 and m1=0111

As starting value for h_0 I suggest to use simply m0. Iteration can be aborted once inc() returns m1.

Summary The above algorithm generates in O(x) all x binary vectors that differ in at least y bits (configurable) from a given vector v.

Using your definition of n=number of bits in a vector v, setting y=n generates exactly 1 vector which is the exact opposite of the input vector v. For y=n-1, it will generate n+1 vectors: n vectors which differ in all but one bits and 1 vector that differs in all bits. And so on different values of y.

**EDIT: Added summary and replaced erroneous 'XOR' with 'NEGATE' in the text above.

like image 103
TilmannZ Avatar answered Oct 16 '22 08:10

TilmannZ