I'm trying to generate <code>n</code> binary vectors of some arbitrary length <code>l</code>, where each vector <code>i</code> has a Hamming distance of <code>d</code> (where <code>d</code> is even) from every other vector <code>j</code>. I'm not sure if there are any theoretical relationships between <code>n</code>, <code>l</code>, and <code>d</code>, but I'm wondering if there are any implementations for this task. My current implementation is shown below. Sometimes I am successful, other times the code hangs, which indicates either a) it's not possible to find <code>n</code> such vectors given <code>l</code> and <code>d</code>, or b) the search takes a long time especially for large values of <code>l</code>. My questions are: <ul> <li>Are there any efficient implementations of this task?</li> <li> What kind of theoretical relationships exist between <code>n</code>, <code>l</code>, and <code>d</code>? <pre class="prettyprint"><code>import numpy as np def get_bin(n): return ''.join([str(np.random.randint(0, 2)) for _ in range(n)]) def hamming(s1, s2): return sum(c1 != c2 for c1, c2 in zip(s1, s2)) def generate_codebook(n, num_codes, d): codebooks = [] seen = [] while len(codebooks) < num_codes: code = get_bin(n) if code in seen: continue else: if len(codebooks) == 0: codebooks.append(code) print len(codebooks), code else: if all(map(lambda x: int(hamming(code, x)) == d, codebooks)): codebooks.append(code) print len(codebooks), code seen.append(code) codebook_vectorized = map(lambda x: map(lambda b: int(b), x), codebooks) return np.array(codebook_vectorized) </code></pre> </li> </ul> Example: <pre class="prettyprint"><code>codebook = generate_codebook(4,3,2) codebook 1 1111 2 1001 3 0101 </code></pre>

Let's build a graph <code>G</code> where every <code>L</code>-bit binary vector <code>v</code> is a vertex. And there is an edge <code>(vi, vj)</code> only when a Hamming distance between <code>vi</code> and <code>vj</code> is equal to <code>d</code>. Now we need to find a clique of size <code>n</code> is this graph. <blockquote> Clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent. </blockquote> The task of finding a clique of given size in an arbitrary graph is NP-complete. You can read about this problem and some algorithms in this wikipedia article. There are many special cases of this problem. For example, for perfect graphs there is a polynomial algorithm. Don't know if it is possible to show that our graph is one of these special cases.

Generating n binary vectors where each vector has a Hamming distance of d from every other vector

Q: What is the Hamming distance between 1101 1010?

11011001 &oplus; 10011101 = 01000100. Since, this contains two 1s, the Hamming distance, d(11011001, 10011101) = 2.

Tags:

python

algorithm

cryptography

vector

encryption

I'm trying to generate n binary vectors of some arbitrary length l, where each vector i has a Hamming distance of d (where d is even) from every other vector j. I'm not sure if there are any theoretical relationships between n, l, and d, but I'm wondering if there are any implementations for this task. My current implementation is shown below. Sometimes I am successful, other times the code hangs, which indicates either a) it's not possible to find n such vectors given l and d, or b) the search takes a long time especially for large values of l.

My questions are:

Are there any efficient implementations of this task?

What kind of theoretical relationships exist between n, l, and d?

Click to copy

import numpy as np

def get_bin(n):
    return ''.join([str(np.random.randint(0, 2)) for _ in range(n)])

def hamming(s1, s2):
    return sum(c1 != c2 for c1, c2 in zip(s1, s2))

def generate_codebook(n, num_codes, d):    
    codebooks = []
    seen = []

    while len(codebooks) < num_codes:
        code = get_bin(n)
        if code in seen:
            continue
        else:
            if len(codebooks) == 0:
                codebooks.append(code)
                print len(codebooks), code
            else:
                if all(map(lambda x: int(hamming(code, x)) == d, codebooks)):
                    codebooks.append(code)
                    print len(codebooks), code
            seen.append(code)

    codebook_vectorized = map(lambda x: map(lambda b: int(b), x), codebooks)
    return np.array(codebook_vectorized)

Example:

Click to copy

codebook = generate_codebook(4,3,2)
codebook

1 1111
2 1001
3 0101

801

asked Jan 17 '18 09:01

Sean Saito

2 Answers

Let's build a graph G where every L-bit binary vector v is a vertex. And there is an edge (vi, vj) only when a Hamming distance between vi and vj is equal to d. Now we need to find a clique of size n is this graph.

Clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent.

The task of finding a clique of given size in an arbitrary graph is NP-complete. You can read about this problem and some algorithms in this wikipedia article.

There are many special cases of this problem. For example, for perfect graphs there is a polynomial algorithm. Don't know if it is possible to show that our graph is one of these special cases.

answered Sep 19 '22 23:09

DAle

Not a real solution, but more of a partial discussion about the relationship between l, d and n and the process of generating vectors. In any case, you may consider posting the question (or a similar one, in more formal terms) to Mathematics Stack Exchange. I have been reasoning as I was writing, but I hope I didn't make a mistake.

Let's say we have l = 6. Since the Hamming distance depends only on position-wise differences, you can decide to start by putting one first arbitrary vector in your set (if there are solutions, some may not include it, but at least one should). So let's begin with an initial v1 = 000000. Now, if d = 1 then obviously n can only be 1 or 2 (with 111111). If d = 1, you will find that n can also only be 1 or 2; for example, you could add 000001, but any other possible vector will have a distance of 2 or more with at least one the vectors you have.

Let's say d = 4. You need to change 4 positions and keep the other 2, so you have 4-combinations from a 6-element set, which is 15 choices, 001111, 010111, etc. - you can see now that the binomial coefficient C(n, d) plus 1 is an upper bound for n. Let's pick v2 = 001111, and say that the kept positions are T = [1, 2] and the changed ones are S = [3, 4, 5, 6]. Now to go on, we could consider making changes to v2; however, in order to keep the right distances we must follow these rules:

We must make 4 changes to v2.
If we change a position in a position in S, we must make another change in a position in T (and viceversa). Otherwise, the distance to v1 would not be kept.

Logically, if d were odd you would be done now (only sets of two elements could be formed), but fortunately you already said that your distance numbers are even. So we divide our number by two, which is 2, and need to pick 2 elements from S, C(4, 2) = 6, and 2 elements from T, C(2, 2) = 1, giving us 6 * 1 = 6 options - you should note now that C(d, d/2) * C(l - d, d/2) + 2 is a new, lower upper bound for n, if d is even. Let's pick v3 = 111100. v3 has now four kinds of positions: positions that have changed with respect to both v1 and v2, P1 = [1, 2], positions that have not changed with respect to either v1 or v2, P2 = [] (none in this case), positions that have changed with respect to v1 but not with respect to v2, P3 = [3, 4], and positions that have changed with respect to v2 but not with respect to v1, P4 = [5, 6]. Same deal, we need 4 changes, but now each change we make to a P1 position must imply a change in a P2 position, and each change we make to a P3 position must imply a change in a P4 position. The only remaining option is v4 = 110011, and that would be it, the maximum n would be 4.

So, thinking about the problem from a combinatoric point of view, after each change you will have an exponentially increasing number of "types of positions" (2 after the first change, 4 after the second, 8, 16...) defined in terms of whether they are equal or not in each of the previously added vectors, and these can be arranged in couples through a "symmetry" or "complement" relationship. On each step, you can (I think, and this is the part of this reasoning that I am less sure about) greedily choose a set of changes from these couples and compute the sizes of the "types of positions" for the next step. If this is all correct, you should be able to write an algorithm based on this to generate and/or count the possible sets of vectors for some particular l and d and n if given.

answered Sep 21 '22 23:09

jdehesa

Related questions
                            
                                In Python Click how do I see --help for Subcommands whose parents have required arguments?
                            
                                Flask routing problems
                            
                                How Can I Fix "TypeError: Cannot mix str and non-str arguments"?
                            
                                How require that an abstract method is a coroutine?
                            
                                Python can import a module that isn't installed
                            
                                How to base64 encode/decode binary files in python?
                            
                                Python: How to check off a checkbox with Browser splinter?
                            
                                Pandas Timestamp index to list of date strings
                            
                                PCA output in Spark doesn't matches with scikit-learn
                            
                                Celery Process 'Worker' exited with 'exitcode 1' [duplicate]
                            
                                AWS Boto / Warrant library: SRP authentication and credentials error
                            
                                How to prefetch a @property with a Django queryset?
                            
                                For loop is overwriting dictionary values in list [duplicate]
                            
                                Accessing static fields from the decorated class
                            
                                VisPy animation point by point from NumPy array
                            
                                How to estimate eps using knn distance plot in DBSCAN
                            
                                Iterable object and Django StreamingHttpResponse
                            
                                Plane-plane intersection in python [closed]
                            
                                How to implement a custom layer wit multiple outputs in Keras?
                            
                                How to have limited ZMQ (ZeroMQ - PyZMQ) queue buffer size in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With