I need to generate binary numbers with the same quantity of ones (or zeros) in random order. Does anyone know any efficient algorithm for fixed-length binary numbers? Example for 2 ones and 4 digits (just to be more clear): <pre class="prettyprint"><code>1100 1010 1001 0110 0101 0011 </code></pre> UPDATE Random order w/o repetitions is significant. Sequence of binary numbers required, not single permutation.

If you have enough memory to store all the possible bit sequences, and you don't mind generating them all before you have the first result, then the solution would be to use some efficient generator to produce all possible sequences into a vector and then shuffle the vector using the Fisher-Yates shuffle. That's easy and unbiased (as long as you use a good random number generator to do the shuffle) but it can use a lot of memory if <code>n</code> is large, particularly if you are not sure you will need to complete the iteration. But there are a couple of solutions which do not require keeping all the possible words in memory. (C implementations of the two solutions follow the text.) <h3>1. Bit shuffle an enumeration</h3> The fastest one (I think) is to first generate a random shuffle of bit values, and then iterate over the possible words one at a time applying the shuffle to the bits of each value. In order to avoid the complication of shuffling actual bits, the words can be generated in a Gray code order in which only two bit positions are changed from one word to the next. (This is also known as a "revolving-door" iteration because as each new <code>1</code> is added, some other <code>1</code> must be removed.) This allows the bit mask to be updated rapidly, but it means that successive entries are highly correlated, which may be unsuitable for some purposes. Also, for small values of <code>n</code> the number of possible bit shuffles is very limited, so there will not be a lot of different sequences produced. (For example, for the case where <code>n</code> is 4 and <code>k</code> is 2, there are 6 possible words which could be sequenced in 6! (720) different ways, but there are only 4! (24) bit-shuffles. This could be ameliorated slightly by starting the iteration at a random position in the sequence.) It is always possible to find a Gray code. Here's an example for n=6, k=3: (The bold bits are swapped at each step. I wanted to underline them but for some inexplicable reason SO allows strikethrough but not underline.) <pre class="prettyprint"><code>111000 010110 100011 010101 101100 001110 010011 001101 011100 101010 001011 101001 110100 011010 000111 011001 100110 110010 100101 110001</code></pre> This sequence can be produced by a recursive algorithm similar to that suggested by @JasonBoubin -- the only difference is that the second half of each recursion needs to be produced in reverse order -- but it's convenient to use a non-recursive version of the algorithm. The one in the sample code below comes from Frank Ruskey's unpublished manuscript on Combinatorial Generation (Algorithm 5.7 on page 130). I modified it to use 0-based indexing, as well as adding the code to keep track of the binary representations. <h3>2. Randomly generate an integer sequence and convert it to combinations</h3> The "more" random but somewhat slower solution is to produce a shuffled list of enumeration indices (which are sequential integers in <code>[0, n choose k)</code>) and then find the word corresponding to each index. The simplest pseudo-random way to produce a shuffled list of integers in a contiguous range is to use a randomly-chosen Linear Congruential Generator (LCG). An LCG is the recursive sequence <code>xi = (a * xi-1 + c) mod m</code>. If <code>m</code> is a power of 2, <code>a mod 4</code> is 1 and <code>c mod 2</code> is 1, then that recursion will cycle through all 2m possible values. To cycle through the range <code>[0, n choose k)</code>, we simply select <code>m</code> to be the next larger power of 2, and then skip any values which are not in the desired range. (That will be fewer than half the values produced, for obvious reasons.) To convert the enumeration index into an actual word, we perform a binomial decomposition of the index based on the fact that the set of <code>n choose k</code> words consists of <code>n-1 choose k</code> words starting with a 0 and <code>n-1 choose k-1</code> words starting with a 1. So to produce the ith word: <ul> <li>if <code>i < n-1 choose k</code> we output a 0 and then the ith word in the set of n-1 bit words with k bits set;</li> <li>otherwise, we output a 1 and then subtract <code>n-1 choose k</code> from i as the index into the set of n-1 bit words with k-1 bits set.</li> </ul> It's convenient to precompute all the useful binomial coefficients. LCGs suffer from the disadvantage that they are quite easy to predict after the first few terms are seen. Also, some of the randomly-selected values of <code>a</code> and <code>c</code> will produce index sequences where successive indices are highly correlated. (Also, the low-order bits are always quite non-random.) Some of these problems could be slightly ameliorated by also applying a random bit-shuffle to the final result. This is not illustrated in the code below but it would slow things down very little and it should be obvious how to do it. (It basically consists of replacing <code>1UL<<n</code> with a table lookup into the shuffled bits). The C code below uses some optimizations which make it a bit challenging to read. The binomial coefficients are stored in a lower-diagonal array: <pre class="prettyprint"><code> row index [ 0] 1 [ 1] 1 1 [ 3] 1 2 1 [ 6] 1 3 3 1 [10] 1 4 6 4 1 </code></pre> As can be seen, the array index for <code>binom(n, k)</code> is <code>n(n+1)/2 + k</code>, and if we have that index, we can find <code>binom(n-1, k)</code> by simply subtracting <code>n</code>, and <code>binom(n-1, k-1)</code> by subtracting <code>n+1</code>. In order to avoid needing to store zeros in the array, we make sure that we never look up a binomial coefficient where <code>k</code> is negative or greater than <code>n</code>. In particular, if we have arrived at a point in the recursion where <code>k == n</code> or <code>k == 0</code>, we can definitely know that the index to look up is 0, because there is only one possible word. Furthermore, index 0 in the set of words with some <code>n</code> and <code>k</code> will consist precisely of <code>n-k</code> zeros followed by <code>k</code> ones, which is the n-bit binary representation of 2k-1. By short-cutting the algorithm when the index reaches 0, we can avoid having to worry about the cases where one of <code>binom(n-1, k)</code> or <code>binom(n-1, k-1)</code> is not a valid index. <h3>C code for the two solutions</h3> <h3>Gray code with shuffled bits</h3> <pre class="prettyprint lang-c prettyprint-override"><code>void gray_combs(int n, int k) { /* bit[i] is the ith shuffled bit */ uint32_t bit[n+1]; { uint32_t mask = 1; for (int i = 0; i < n; ++i, mask <<= 1) bit[i] = mask; bit[n] = 0; shuffle(bit, n); } /* comb[i] for 0 <= i < k is the index of the ith bit * in the current combination. comb[k] is a sentinel. */ int comb[k + 1]; for (int i = 0; i < k; ++i) comb[i] = i; comb[k] = n; /* Initial word has the first k (shuffled) bits set */ uint32_t word = 0; for (int i = 0; i < k; ++i) word |= bit[i]; /* Now iterate over all combinations */ int j = k - 1; /* See Ruskey for meaning of j */ do { handle(word, n); if (j < 0) { word ^= bit[comb[0]] | bit[comb[0] - 1]; if (--comb[0] == 0) j += 2; } else if (comb[j + 1] == comb[j] + 1) { word ^= bit[comb[j + 1]] | bit[j]; comb[j + 1] = comb[j]; comb[j] = j; if (comb[j + 1] == comb[j] + 1) j += 2; } else if (j > 0) { word ^= bit[comb[j - 1]] | bit[comb[j] + 1]; comb[j - 1] = comb[j]; ++comb[j]; j -= 2; } else { word ^= bit[comb[j]] | bit[comb[j] + 1]; ++comb[j]; } } while (comb[k] == n); } </code></pre> <h3>LCG with enumeration index to word conversion</h3> <pre class="prettyprint lang-c prettyprint-override"><code>static const uint32_t* binom(unsigned n, unsigned k) { static const uint32_t b[] = { 1, 1, 1, 1, 2, 1, 1, 3, 3, 1, 1, 4, 6, 4, 1, 1, 5, 10, 10, 5, 1, 1, 6, 15, 20, 15, 6, 1, // ... elided for space }; return &b[n * (n + 1) / 2 + k]; } static uint32_t enumerate(const uint32_t* b, uint32_t r, unsigned n, unsigned k) { uint32_t rv = 0; while (r) { do { b -= n; --n; } while (r < *b); r -= *b; --b; --k; rv |= 1UL << n; } return rv + (1UL << k) - 1; } static bool lcg_combs(unsigned n, unsigned k) { const uint32_t* b = binom(n, k); uint32_t count = *b; uint32_t m = 1; while (m < count) m <<= 1; uint32_t a = 4 * randrange(1, m / 4) + 1; uint32_t c = 2 * randrange(0, m / 2) + 1; uint32_t x = randrange(0, m); while (count--) { do x = (a * x + c) & (m - 1); while (x >= *b); handle(enumerate(b, x, n, k), n); } return true; } </code></pre> Note: I didn't include the implementation of <code>randrange</code> or <code>shuffle</code>; code is readily available. <code>randrange(low, lim)</code> produces a random integer in the range <code>[low, lim)</code>; <code>shuffle(vec, n)</code> randomly shuffles the integer vector <code>vec</code>of length <code>n</code>. Also, the the loop calls <code>handle(word, n)</code> for each generated word. That must must be replaced with whatever is to be done with each combination. With <code>handle</code> defined as a function which does nothing, <code>gray_combs</code> took 150 milliseconds on my laptop to find all 40,116,600 28-bit words with 14 bits set. <code>lcg_combs</code> took 5.5 seconds.

Iterate binary numbers with the same quantity of ones (or zeros) in random order

Tags:

algorithm

math

I need to generate binary numbers with the same quantity of ones (or zeros) in random order.
Does anyone know any efficient algorithm for fixed-length binary numbers? Example for 2 ones and 4 digits (just to be more clear):

UPDATE Random order w/o repetitions is significant. Sequence of binary numbers required, not single permutation.

815

asked Jun 15 '17 13:06

UNdedss

1 Answers

If you have enough memory to store all the possible bit sequences, and you don't mind generating them all before you have the first result, then the solution would be to use some efficient generator to produce all possible sequences into a vector and then shuffle the vector using the Fisher-Yates shuffle. That's easy and unbiased (as long as you use a good random number generator to do the shuffle) but it can use a lot of memory if n is large, particularly if you are not sure you will need to complete the iteration.

But there are a couple of solutions which do not require keeping all the possible words in memory. (C implementations of the two solutions follow the text.)

1. Bit shuffle an enumeration

The fastest one (I think) is to first generate a random shuffle of bit values, and then iterate over the possible words one at a time applying the shuffle to the bits of each value. In order to avoid the complication of shuffling actual bits, the words can be generated in a Gray code order in which only two bit positions are changed from one word to the next. (This is also known as a "revolving-door" iteration because as each new 1 is added, some other 1 must be removed.) This allows the bit mask to be updated rapidly, but it means that successive entries are highly correlated, which may be unsuitable for some purposes. Also, for small values of n the number of possible bit shuffles is very limited, so there will not be a lot of different sequences produced. (For example, for the case where n is 4 and k is 2, there are 6 possible words which could be sequenced in 6! (720) different ways, but there are only 4! (24) bit-shuffles. This could be ameliorated slightly by starting the iteration at a random position in the sequence.)

It is always possible to find a Gray code. Here's an example for n=6, k=3: (The bold bits are swapped at each step. I wanted to underline them but for some inexplicable reason SO allows strikethrough but not underline.)

111000   010110   100011   010101
101100   001110   010011   001101
011100   101010   001011   101001
110100   011010   000111   011001
100110   110010   100101   110001

This sequence can be produced by a recursive algorithm similar to that suggested by @JasonBoubin -- the only difference is that the second half of each recursion needs to be produced in reverse order -- but it's convenient to use a non-recursive version of the algorithm. The one in the sample code below comes from Frank Ruskey's unpublished manuscript on Combinatorial Generation (Algorithm 5.7 on page 130). I modified it to use 0-based indexing, as well as adding the code to keep track of the binary representations.

2. Randomly generate an integer sequence and convert it to combinations

The "more" random but somewhat slower solution is to produce a shuffled list of enumeration indices (which are sequential integers in [0, n choose k)) and then find the word corresponding to each index.

The simplest pseudo-random way to produce a shuffled list of integers in a contiguous range is to use a randomly-chosen Linear Congruential Generator (LCG). An LCG is the recursive sequence x_i = (a * x_i-1 + c) mod m. If m is a power of 2, a mod 4 is 1 and c mod 2 is 1, then that recursion will cycle through all 2^m possible values. To cycle through the range [0, n choose k), we simply select m to be the next larger power of 2, and then skip any values which are not in the desired range. (That will be fewer than half the values produced, for obvious reasons.)

To convert the enumeration index into an actual word, we perform a binomial decomposition of the index based on the fact that the set of n choose k words consists of n-1 choose k words starting with a 0 and n-1 choose k-1 words starting with a 1. So to produce the i^th word:

if i < n-1 choose k we output a 0 and then the i^th word in the set of n-1 bit words with k bits set;
otherwise, we output a 1 and then subtract n-1 choose k from i as the index into the set of n-1 bit words with k-1 bits set.

It's convenient to precompute all the useful binomial coefficients.

LCGs suffer from the disadvantage that they are quite easy to predict after the first few terms are seen. Also, some of the randomly-selected values of a and c will produce index sequences where successive indices are highly correlated. (Also, the low-order bits are always quite non-random.) Some of these problems could be slightly ameliorated by also applying a random bit-shuffle to the final result. This is not illustrated in the code below but it would slow things down very little and it should be obvious how to do it. (It basically consists of replacing 1UL<<n with a table lookup into the shuffled bits).

The C code below uses some optimizations which make it a bit challenging to read. The binomial coefficients are stored in a lower-diagonal array:

  row
index
 [ 0]   1
 [ 1]   1 1
 [ 3]   1 2 1
 [ 6]   1 3 3 1
 [10]   1 4 6 4 1

As can be seen, the array index for binom(n, k) is n(n+1)/2 + k, and if we have that index, we can find binom(n-1, k) by simply subtracting n, and binom(n-1, k-1) by subtracting n+1. In order to avoid needing to store zeros in the array, we make sure that we never look up a binomial coefficient where k is negative or greater than n. In particular, if we have arrived at a point in the recursion where k == n or k == 0, we can definitely know that the index to look up is 0, because there is only one possible word. Furthermore, index 0 in the set of words with some n and k will consist precisely of n-k zeros followed by k ones, which is the n-bit binary representation of 2^k-1. By short-cutting the algorithm when the index reaches 0, we can avoid having to worry about the cases where one of binom(n-1, k) or binom(n-1, k-1) is not a valid index.

C code for the two solutions

Gray code with shuffled bits

void gray_combs(int n, int k) {
  /* bit[i] is the ith shuffled bit */
  uint32_t bit[n+1];
  {
    uint32_t mask = 1;
    for (int i = 0; i < n; ++i, mask <<= 1)
      bit[i] = mask;
    bit[n] = 0;
    shuffle(bit, n);
  }

  /* comb[i] for 0 <= i < k is the index of the ith bit
   * in the current combination. comb[k] is a sentinel. */
  int comb[k + 1];
  for (int i = 0; i < k; ++i) comb[i] = i;
  comb[k] = n;

  /* Initial word has the first k (shuffled) bits set */
  uint32_t word = 0;
  for (int i = 0; i < k; ++i) word |= bit[i];

  /* Now iterate over all combinations */
  int j = k - 1; /* See Ruskey for meaning of j */
  do {
    handle(word, n);
    if (j < 0) {
      word ^= bit[comb[0]] | bit[comb[0] - 1];
      if (--comb[0] == 0) j += 2;
    }
    else if (comb[j + 1] == comb[j] + 1) {
      word ^= bit[comb[j + 1]] | bit[j];
      comb[j + 1] = comb[j]; comb[j] = j;
      if (comb[j + 1] == comb[j] + 1) j += 2;
    }
    else if (j > 0) {
      word ^= bit[comb[j - 1]] | bit[comb[j] + 1];
      comb[j - 1] = comb[j]; ++comb[j];
      j -= 2;
    }
    else {
      word ^= bit[comb[j]] | bit[comb[j] + 1];
      ++comb[j];
    }
  } while (comb[k] == n);
}

LCG with enumeration index to word conversion

static const uint32_t* binom(unsigned n, unsigned k) {
  static const uint32_t b[] = {
    1,
    1, 1,
    1, 2, 1,
    1, 3, 3, 1,
    1, 4, 6, 4, 1,
    1, 5, 10, 10, 5, 1,
    1, 6, 15, 20, 15, 6, 1,
    // ... elided for space
  };
  return &b[n * (n + 1) / 2 + k];
}

static uint32_t enumerate(const uint32_t* b, uint32_t r, unsigned n, unsigned k) {
  uint32_t rv = 0;
  while (r) {
    do {
      b -= n;
      --n;
    } while (r < *b);
    r -= *b;
    --b;
    --k;
    rv |= 1UL << n;
  }
  return rv + (1UL << k) - 1;
}

static bool lcg_combs(unsigned n, unsigned k) {
  const uint32_t* b = binom(n, k);
  uint32_t count = *b;
  uint32_t m = 1; while (m < count) m <<= 1;
  uint32_t a = 4 * randrange(1, m / 4) + 1;
  uint32_t c = 2 * randrange(0, m / 2) + 1;
  uint32_t x = randrange(0, m);
  while (count--) {
    do
      x = (a * x + c) & (m - 1);
    while (x >= *b);
    handle(enumerate(b, x, n, k), n);
  }
  return true;
}

Note: I didn't include the implementation of randrange or shuffle; code is readily available. randrange(low, lim) produces a random integer in the range [low, lim); shuffle(vec, n) randomly shuffles the integer vector vecof length n.

Also, the the loop calls handle(word, n) for each generated word. That must must be replaced with whatever is to be done with each combination.

With handle defined as a function which does nothing, gray_combs took 150 milliseconds on my laptop to find all 40,116,600 28-bit words with 14 bits set. lcg_combs took 5.5 seconds.

132

answered Nov 01 '22 05:11

rici

Related questions
                            
                                Algorithm for matching strings between two large files
                            
                                Worst-case O(n) algorithm for doing k-selection
                            
                                Interview test - rearrange the array [duplicate]
                            
                                Dijkstras Algorithm doesn't appear to work, my understanding must be flawed
                            
                                Quickly compare a string against a Collection in Java
                            
                                Algorithm for Converting one word to other word by changing each letter per iteration which should form an another meaningful word?
                            
                                How to remove negative values from a List<int>?
                            
                                Point inside 2D axis aligned rectangle, no branches
                            
                                Interview : Suggest a data structure which optimizes insertion, deletion and random value generation
                            
                                Given two arrays find the index k that minimizes the sum A[i]*|B[i]-B[k]|
                            
                                How to find array elements that fall in a given interval?
                            
                                Maze solving by image recognition
                            
                                Solving the recurrence T(n) = 2T(sqrt(n))
                            
                                Python - how to speed up calculation of distances between cities
                            
                                Maximum weighted independent set in bipartite graph
                            
                                What's the difference between the data structure List and Graph?
                            
                                What is the flaw in this string comparison logic?
                            
                                How would you express this in Haskell?
                            
                                remove_if last character from a string
                            
                                Binary Search using start < end vs. using start <= end

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With