Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary String permutations

I came across a problem on http://www.interviewstreet.com .

Bob has received a binary string of length N transmitted by Alice. He knows that due to errors in transmission, up to K bits might have been corrupted (and hence flipped). However, he also knows that the string Alice had intended to transmit was not periodic. A string is not periodic if it cannot be represented as a smaller string concatenated some number of times. For example, "0001", "0110" are not periodic while "00000", "010101" are periodic strings. Now he wonders how many possible strings could Alice have transmitted.

So first off I did some tests with the Binomial theorem and through the use of that I am able to find how many different ways a String can be represented given a String and a number of corrupted bits. My second step was to find a way with which to find the number of periodic Strings. I see that this can easily be done with Strings with a prime numbered length. This is done by checking if there are enough 0's or 1's to fill the String up with only 0's or 1's.

1111111 or 0000000

Right now I use a pure brute force algorithm which just wont cut it when it comes to any sort of large string. Is there any sort of Combinatorics techniques that somebody could point me to that would help solve this problem? Thanks.

like image 523
Olychuck Avatar asked Oct 19 '11 03:10

Olychuck


People also ask

What are the permutations of a string?

A permutation of a string S iis another string that contains the same characters, only the order of characters can be different. For example, “abcd” and “dabc” are permutations of each other.

How many permutations are there of the binary digits?

Binary code is made of bits (0 or 1). We often use Bytes to store data. A Byte is made of eight bits and can be used to store any whole number between 0 to 255. This is because with 8 bits you can generate 256 different permutations.

How many binary strings does length 4 have?

Answer and Explanation: A bit string of length 4 would have all the binary numbers between, 0000 and 1111 . There can be no other permutations or arrangements of the digits that can contain consecutive 1s. Hence there must be 10 bit-strings of length 4 which do not have consecutive 1s.

What is binary string?

A binary string is a sequence of bytes. Unlike a character string which usually contains text data, a binary string is used to hold non-traditional data such as pictures. The length of a binary string is the number of bytes in the sequence. A binary string has a CCSID of 65535.


3 Answers

Lior was on the right track.

The total number of strings of length N is 2^N. Some of these are periodic. Others are not. Let's call the number of periodic strings A(N), and the number of non-periodic strings B(N). Then

A(N) + B(N) = 2^N

If we define strings of length 1 to be non-periodic, then

A(1) = 0
B(1) = 2

Let's assume now that N > 1. Then the set of periodic strings of length N includes strings that are periodic with a period shorter than N. However, this is not the case for the set of non-periodic strings of length N.

The set of periodic strings of length N is made up of repetitions of non-periodic strings of lengths that are divisors of n, including those of length 1. In other words:

A(N) = sum(B(k) where k divides N and k < N)

For example:

A(6) = B(1) + B(2) + B(3)
     = (2^1 - A(1)) + (2^2 - A(2)) + (2^3 - A(3))
     = 2 + (4 - B(1)) + (8 - B(1))
     = 2 + 2 + 6
     = 10

So we now have a recurrence equation for the number of periodic and non-periodic strings of length N.

Unfortunately, this doesn't help us that much to answer the actual question.

The question implies that Bob has received a specific string, and he wants to know how many non-periodic strings differ by at most K bits from this string. There are C(N,K) possible mutations of the received string that could be the transmitted string. We need to subtract from this the number of periodic strings in this set. How can we go about this?

First, we can use the observation that any periodic string is a repetition of non-periodic strings. So, for each potential period k (divisor of N), we look at the substrings of length k. If all strings are different from a common string by no more than K bits combined, then this common string is the basis for a periodic string and we should decrease the count by one. If the minimum distance is d, and K - d > N/k, then we can flip individual bits in each substring and still have a match, and we have to decrease our count accordingly.

like image 103
Jeffrey Sax Avatar answered Nov 04 '22 03:11

Jeffrey Sax


To count the number of non-periodic strings on length n:

  • The total number of strings: 2ⁿ
  • Subtract the number of periodic strings of length n

To count the number of periodic strings on length n:

  • Find all divisors of n, except n itself. For example: if n=6 - the divisors are 1,2,3.

    (The method for this had been discussed here)

  • Each divisor m can be used to represent 2^m periodic strings. For example

  • m=1: {0,1} - 2^1 periodic strings
  • m=2: {00,01,10,11} - 2^2 periodic strings
  • m=3: {000,...111} - 2^3 periodic strings

    So for n=6, there are 2+4+8 periodic strings

    As Jeffery Sax and ANeves pointed out, some of these periodic strings are identical {for example 0* = 00* = 000*), so we have to eliminate these.

    A naive method would be to add all these strings to an associative container that stores unique elements (such as set in C++), and count the number of elements in the that container.

    A better optimization would be: for m=m1, find all divisors of m1, and avoid adding strings that are periodic of strings already in these sets.

The next step would be to calculate the Hamming distance between any of these periodic strings and the received string. If it is less than K- count it.


Edit: A better solution for large N and small K

Algorithm for checking if a string is periodic:

This can be accomplished by comparing the string with a shifted-version of itself. If a string is identical to it's p-bit circular-shift - then it has a cycle of p.

So circularly-shifting the string one bit at a time - we can detect if it is periodic in up to floor(N/2) string comparisons.

Counting possible transmitted strings

If there would be no non-periodic transmission requirement, and we received an N bits message - the number of possible messages that could have been transmitted is C(N, 0) + C(N, 1) + C(N, 2) + ... + C(N, K)

For N=1000 and K=3: C(1000,0)+C(1000,1)+C(1000,2)+C(1000,3)= 166,667,501

(This is the number of combinations of switching 0/1/2/3 bits in the original string).

From this number, we need to decrease the number of periodic strings - which couldn't have been transmitted.

For example: if the received string was 000000 and K=2, we can be sure that the transmitted string was not in {000000,001001,010010,100100}. These are all periodic, with hamming distance of up to K from the received string.

C(6,0)+C(6,1)+C(6,2)=1+6+15=22 Out of these, 4 combinations are periodic.

Algorithm:

We'll start with the received string, and generate all combinations stated above. For each combination we will check if it is periodic. If so - we'll decrease our count by 1.

like image 29
Lior Kogan Avatar answered Nov 04 '22 03:11

Lior Kogan


Lior and Jeffrey's answers form the foundation for solving the problem, but one interesting problem that remains to be solved in this post is, how can you efficiently compute the number of strings which are periodic for a given [input string, N, K]. My answer will primarily focus on that.

As pointed out by Lior and Jeffrey, we need only to care about the substrings of length equal to the divisors of n when checking for strings which are periodic. Let's look at an example to see how efficient we can achieve this.

Number of periodic strings with period m

Let the input string be

0110 0011 0101 0001

and let us try to find the number of periodic strings with period m=4

First Bit

If we compare the first bit of each of the substrings, we'll see that all of them are 0s. If we assume all the subsequent bits are the same, amongst all the substrings, then the input string can be made periodic (with period 4), when performing either 0 bitflips or 4 bitflips.

0110 0011 0101 0001
^    ^    ^    ^
Number of 0s = 4
Number of 1s = 0
Number of bitflips to make all 0s to 1s = 4
Number of bitflips to make all 1s to 0s = 0


Number of periodic strings with period=4 for:
k = 0  =>  1
k = 4  =>  1

So now we know there exists 2 strings which are periodic, one for k=0 and other for k=4 (assuming that the subsequent bits are the same in all the substrings).

Second Bit

Now let's progress to the second bit.

0110 0011 0101 0001
 ^    ^    ^    ^
Number of 0s = 2
Number of 1s = 2
Number of bitflips to make all 0s to 1s = 2
Number of bitflips to make all 1s to 0s = 2

But wait, the above statement is true IFF all the bits prior to the current bit in the substring also contribute in making the string periodic. We know that only k=0 and k=4, each make the string periodic up to 1st bit.

So when accounting for all bits upto the 2nd bit, we can get a periodic string in the following 4 cases:

When previousK = 0:
    Flip the 2 `0`s to `1`s => new k = 2
    Flip the 2 `1`s to `0`s => new k = 2
When previousK = 4:
    Flip the 2 `0`s to `1`s => new k = 6
    Flip the 2 `1`s to `0`s => new k = 6

Number of periodic strings with period=4 for:
k = 2  =>  2
k = 6  =>  2

Third Bit

Moving on to third bit, we'll see this:

0110 0011 0101 0001
  ^    ^    ^    ^
Number of 0s = 2
Number of 1s = 2
Number of bitflips to make all 0s to 1s = 2
Number of bitflips to make all 1s to 0s = 2

We can get a periodic string in the following 4 cases:
When previousK = 2:
    Flip the 2 `0`s to `1`s => new k = 4
    Flip the 2 `1`s to `0`s => new k = 4
When previousK = 6:
    Flip the 2 `0`s to `1`s => new k = 8
    Flip the 2 `1`s to `0`s => new k = 8

Number of periodic strings with period=4 for:
k = 4  =>  4
k = 8  =>  4

Fourth Bit

For our fourth and final bit:

0110 0011 0101 0001
   ^    ^    ^    ^
Number of 0s = 1
Number of 1s = 3

We can get a periodic string in the following 4 cases:
When previousK = 4:
    Flip the 1 `0`s to `1`s => new k = 5
    Flip the 3 `1`s to `0`s => new k = 7
When previousK = 8:
    Flip the 1 `0`s to `1`s => new k = 9
    Flip the 3 `1`s to `0`s => new k = 11

Number of periodic strings with period=4 for:
k = 5  =>  4
k = 7  =>  4
k = 9  =>  4
k = 11 =>  4

We are now done with the last bit of the substring and the total of the periodic strings for various k values in the last step gives 16.

Recursive relation and Pseudocode

Let us denote the current count of periodic strings for any k which varies from 1 to K using R[k]. For each iteration, we need to look up the previous iteration's R[] value.

What we essentially end up doing in each iteration is:

for offset = 0 to periodLen - 1
    flip R[] and previousR[]

    for currentK = 1 to K
        R[currentK] = 0

    numZeroes = 0
    for (pos = offset; pos < n; pos += periodLen)
        if (str[pos] == '0')
            ++numZeros

    numOnes = (n / m) - numZeroes;

    for currentK = 1 to K
        if m == 0
            R[currentK + numZeroes] = 1
            R[currentK + numOnes] = 1
        else if (previousR[currentK] > 0)
            R[currrentK + numZeroes] += previousR[currentK]
            R[currentK + numOnes] += previousR[currentK]

    totalPeriodicCount = 0
    for currentK = 1 to K
        totalPeriodicCount += R[currentK]

If we perform the above process by iterating over all periods from the lowest to the highest we'll get the count of all the periodic strings. The periods which will be chosen will be the divisors of N which are less than N. Going over them from lowest to highest will have an advantage, read the next section for the details.

Accounting for periods which are divisible by smaller periods

On close observation you'll notice that we're also counting certain periodic strings more than once.

eg. the following periodic string:

0001 0001 0001 0001

will end up being counted as part of both m = 4, and m = 8

Let C[m] denote the total count of periodic strings obtained for a period of length m obtained using the above pseudocode. Let C[m'] denote the actual count of periodic strings obtained using period of length m but not counting the periodic strings which can be formed using periods < m

More specifically, if the current period m has divisors t, u and v which are less than m, then we'll be counting the number of periodic strings of periods t, u and v as well, i.e.

C[m] = C[t'] + C[u'] + C[v'] + C[m']

When counting the total number of periodic strings for all values of m, we need to take care of excluding C[t], C[u], and C[v] and only account for C[m'].

Since when we're computing C[m], we already would have computed the values for C[t'], C[u'] and C[v'], we just need to look them up and subtract them from C[m] to obtain C[m']. I'll leave this simple portion as an exercise for the reader.

C[m'] = C[m] - C[t'] - C[u'] - C[v']
like image 36
Tuxdude Avatar answered Nov 04 '22 03:11

Tuxdude