I need to generate binary numbers with the same quantity of ones (or zeros) in random order.
Does anyone know any efficient algorithm for fixed-length binary numbers?
Example for 2 ones and 4 digits (just to be more clear):
1100
1010
1001
0110
0101
0011
UPDATE Random order w/o repetitions is significant. Sequence of binary numbers required, not single permutation.
Algorithm. Step 1 : Given two numbers. Step 2 : Convert both number into its binary using bin() function and remove first two characters because of bin(). Step 3 : Since binary representation of both numbers could differ in length so we will append zeroes in start of shorter string to make both string of equal length.
public static void main(String[] args) { char[] num1 = {'1','1','0'}; char[] num2 = {'1','1','1'}; System. out. println(compare(num1, num2)); } public static int compare(char[]num1, char[]num2) { // Convert Array of Characters to String String one = String. valueOf(num1); String two = String.
If you have enough memory to store all the possible bit sequences, and you don't mind generating them all before you have the first result, then the solution would be to use some efficient generator to produce all possible sequences into a vector and then shuffle the vector using the Fisher-Yates shuffle. That's easy and unbiased (as long as you use a good random number generator to do the shuffle) but it can use a lot of memory if n
is large, particularly if you are not sure you will need to complete the iteration.
But there are a couple of solutions which do not require keeping all the possible words in memory. (C implementations of the two solutions follow the text.)
The fastest one (I think) is to first generate a random shuffle of bit values, and then iterate over the possible words one at a time applying the shuffle to the bits of each value. In order to avoid the complication of shuffling actual bits, the words can be generated in a Gray code order in which only two bit positions are changed from one word to the next. (This is also known as a "revolving-door" iteration because as each new 1
is added, some other 1
must be removed.) This allows the bit mask to be updated rapidly, but it means that successive entries are highly correlated, which may be unsuitable for some purposes. Also, for small values of n
the number of possible bit shuffles is very limited, so there will not be a lot of different sequences produced. (For example, for the case where n
is 4 and k
is 2, there are 6 possible words which could be sequenced in 6! (720) different ways, but there are only 4! (24) bit-shuffles. This could be ameliorated slightly by starting the iteration at a random position in the sequence.)
It is always possible to find a Gray code. Here's an example for n=6, k=3: (The bold bits are swapped at each step. I wanted to underline them but for some inexplicable reason SO allows strikethrough but not underline.)
111000 010110 100011 010101
101100 001110 010011 001101
011100 101010 001011 101001
110100 011010 000111 011001
100110 110010 100101 110001
This sequence can be produced by a recursive algorithm similar to that suggested by @JasonBoubin -- the only difference is that the second half of each recursion needs to be produced in reverse order -- but it's convenient to use a non-recursive version of the algorithm. The one in the sample code below comes from Frank Ruskey's unpublished manuscript on Combinatorial Generation (Algorithm 5.7 on page 130). I modified it to use 0-based indexing, as well as adding the code to keep track of the binary representations.
The "more" random but somewhat slower solution is to produce a shuffled list of enumeration indices (which are sequential integers in [0, n choose k)
) and then find the word corresponding to each index.
The simplest pseudo-random way to produce a shuffled list of integers in a contiguous range is to use a randomly-chosen Linear Congruential Generator (LCG). An LCG is the recursive sequence xi = (a * xi-1 + c) mod m
. If m
is a power of 2, a mod 4
is 1 and c mod 2
is 1, then that recursion will cycle through all 2m possible values. To cycle through the range [0, n choose k)
, we simply select m
to be the next larger power of 2, and then skip any values which are not in the desired range. (That will be fewer than half the values produced, for obvious reasons.)
To convert the enumeration index into an actual word, we perform a binomial decomposition of the index based on the fact that the set of n choose k
words consists of n-1 choose k
words starting with a 0 and n-1 choose k-1
words starting with a 1. So to produce the ith word:
i < n-1 choose k
we output a 0 and then the ith word in the set of n-1 bit words with k bits set;n-1 choose k
from i as the index into the set of n-1 bit words with k-1 bits set.It's convenient to precompute all the useful binomial coefficients.
LCGs suffer from the disadvantage that they are quite easy to predict after the first few terms are seen. Also, some of the randomly-selected values of a
and c
will produce index sequences where successive indices are highly correlated. (Also, the low-order bits are always quite non-random.) Some of these problems could be slightly ameliorated by also applying a random bit-shuffle to the final result. This is not illustrated in the code below but it would slow things down very little and it should be obvious how to do it. (It basically consists of replacing 1UL<<n
with a table lookup into the shuffled bits).
The C code below uses some optimizations which make it a bit challenging to read. The binomial coefficients are stored in a lower-diagonal array:
row
index
[ 0] 1
[ 1] 1 1
[ 3] 1 2 1
[ 6] 1 3 3 1
[10] 1 4 6 4 1
As can be seen, the array index for binom(n, k)
is n(n+1)/2 + k
, and if we have that index, we can find binom(n-1, k)
by simply subtracting n
, and binom(n-1, k-1)
by subtracting n+1
. In order to avoid needing to store zeros in the array, we make sure that we never look up a binomial coefficient where k
is negative or greater than n
. In particular, if we have arrived at a point in the recursion where k == n
or k == 0
, we can definitely know that the index to look up is 0, because there is only one possible word. Furthermore, index 0 in the set of words with some n
and k
will consist precisely of n-k
zeros followed by k
ones, which is the n-bit binary representation of 2k-1. By short-cutting the algorithm when the index reaches 0, we can avoid having to worry about the cases where one of binom(n-1, k)
or binom(n-1, k-1)
is not a valid index.
void gray_combs(int n, int k) {
/* bit[i] is the ith shuffled bit */
uint32_t bit[n+1];
{
uint32_t mask = 1;
for (int i = 0; i < n; ++i, mask <<= 1)
bit[i] = mask;
bit[n] = 0;
shuffle(bit, n);
}
/* comb[i] for 0 <= i < k is the index of the ith bit
* in the current combination. comb[k] is a sentinel. */
int comb[k + 1];
for (int i = 0; i < k; ++i) comb[i] = i;
comb[k] = n;
/* Initial word has the first k (shuffled) bits set */
uint32_t word = 0;
for (int i = 0; i < k; ++i) word |= bit[i];
/* Now iterate over all combinations */
int j = k - 1; /* See Ruskey for meaning of j */
do {
handle(word, n);
if (j < 0) {
word ^= bit[comb[0]] | bit[comb[0] - 1];
if (--comb[0] == 0) j += 2;
}
else if (comb[j + 1] == comb[j] + 1) {
word ^= bit[comb[j + 1]] | bit[j];
comb[j + 1] = comb[j]; comb[j] = j;
if (comb[j + 1] == comb[j] + 1) j += 2;
}
else if (j > 0) {
word ^= bit[comb[j - 1]] | bit[comb[j] + 1];
comb[j - 1] = comb[j]; ++comb[j];
j -= 2;
}
else {
word ^= bit[comb[j]] | bit[comb[j] + 1];
++comb[j];
}
} while (comb[k] == n);
}
static const uint32_t* binom(unsigned n, unsigned k) {
static const uint32_t b[] = {
1,
1, 1,
1, 2, 1,
1, 3, 3, 1,
1, 4, 6, 4, 1,
1, 5, 10, 10, 5, 1,
1, 6, 15, 20, 15, 6, 1,
// ... elided for space
};
return &b[n * (n + 1) / 2 + k];
}
static uint32_t enumerate(const uint32_t* b, uint32_t r, unsigned n, unsigned k) {
uint32_t rv = 0;
while (r) {
do {
b -= n;
--n;
} while (r < *b);
r -= *b;
--b;
--k;
rv |= 1UL << n;
}
return rv + (1UL << k) - 1;
}
static bool lcg_combs(unsigned n, unsigned k) {
const uint32_t* b = binom(n, k);
uint32_t count = *b;
uint32_t m = 1; while (m < count) m <<= 1;
uint32_t a = 4 * randrange(1, m / 4) + 1;
uint32_t c = 2 * randrange(0, m / 2) + 1;
uint32_t x = randrange(0, m);
while (count--) {
do
x = (a * x + c) & (m - 1);
while (x >= *b);
handle(enumerate(b, x, n, k), n);
}
return true;
}
Note: I didn't include the implementation of randrange
or shuffle
; code is readily available. randrange(low, lim)
produces a random integer in the range [low, lim)
; shuffle(vec, n)
randomly shuffles the integer vector vec
of length n
.
Also, the the loop calls handle(word, n)
for each generated word. That must must be replaced with whatever is to be done with each combination.
With handle
defined as a function which does nothing, gray_combs
took 150 milliseconds on my laptop to find all 40,116,600 28-bit words with 14 bits set. lcg_combs
took 5.5 seconds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With