How to efficiently transpose a 2D bit matrix

Tags:

I keep stumbling over this problem (e.g. in this question). Given a 2D bit matrix/board/array in the form of an array of primitive integer types, e.g. an array of long. For simplicity, we can assume a square matrix, e.g., an array of 64 long values on platforms that have 64 bit long.

Let x[i] for 0 <= i < 64 be the input array. Compute an array y[i] for 0 <= i <= 64 such that:

(x[i] >> j) & 1 == (y[j] >> i) & 1

Here x >> i is the bitwise right-shift of x by i bits, & is bitwise and, and x[i] is the value at ith position in array x.

How to implement a function that maps array x to array y most efficiently?

Primarily I am looking for non-destructive methods, which leave the input array x intact.

Implementation language

The used programming language should have arrays and bitwise operations on integer types. Many languages fulfill these requirements. C/C++ and Java solutions will look very similar, so let's choose these languages.

343

asked Jan 21 '17 10:01

ziggystar

1 Answers

This seems a generalization of the question Bitwise transpose of 8 bytes. That question was just about 8x8 transposition, so what you are asking is a bit different. But your question is answered just as well in section 7.3 of the book Hacker's Delight (you might be able to see the relevant pages on Google books). The code that is presented there apparently originates with Guy Steele.

The Hacker's Delight website only contains the source code from the book for the 8x8 and 32x32 cases, but the latter generalizes trivially to your 64x64 case:

#include <stdint.h>

void
transpose64(uint64_t a[64]) {
  int j, k;
  uint64_t m, t;

  for (j = 32, m = 0x00000000FFFFFFFF; j; j >>= 1, m ^= m << j) {
    for (k = 0; k < 64; k = ((k | j) + 1) & ~j) {
      t = (a[k] ^ (a[k | j] >> j)) & m;
      a[k] ^= t;
      a[k | j] ^= (t << j);
    }
  }
}

The way that this works is that the function swaps successively smaller blocks of bits, starting with 32x32 blocks (without transposing the bit within those blocks), after that within those 32x32 blocks it swaps the appropriate 16x16 blocks, etc. The variable that holds the block size is j. Therefore, the outer loop has j succcessively take the values 32, 16, 8, 4, 2 and 1, which means that the outer loop runs six times. The inner loop runs over half the lines of your of bits, the lines where a given bit in the variable k is equal to zero. When j is 32 those are the lines 0-31, when j is 16 those are the lines 0-15 and 32-47, etc. Together the inner part of the loop runs 6*32 = 192 times. What happens inside this inner part is that the mask m determines what are the bits that should be swapped, in t the xor or those bits are calculated, and that xor-ed lists of bits is used to update the bits in both places appropriately.

The book (and the website) also has a version of this code in which these loops have both been unrolled, and where the mask m is not calculated, but just assigned. I guess it depends on things like the number of registers and the size of your instruction cache whether that is an improvement?

To test that this works, suppose we define some bit pattern, say:

uint64_t logo[] = {
0b0000000000000000000000000000000000000000000100000000000000000000,
0b0000000000000000000000000000000000000000011100000000000000000000,
0b0000000000000000000000000000000000000000111110000000000000000000,
0b0000000000000000000000000000000000000001111111000000000000000000,
0b0000000000000000000000000000000000000000111111100000000000000000,
0b0000000000000000000000000000000000000000111111100000000000000000,
0b0000000000000000000000000000000000000000011111110000000000000000,
0b0000000000000000000000000000000000000000001111111000000000000000,
0b0000000000000000000000000000000000000000001111111100000000000000,
0b0000000000000000000000000000000010000000000111111100000000000000,
0b0000000000000000000000000000000011100000000011111110000000000000,
0b0000000000000000000000000000000111110000000001111111000000000000,
0b0000000000000000000000000000001111111000000001111111100000000000,
0b0000000000000000000000000000011111111100000000111111100000000000,
0b0000000000000000000000000000001111111110000000011111110000000000,
0b0000000000000000000000000000000011111111100000001111111000000000,
0b0000000000000000000000000000000001111111110000001111111100000000,
0b0000000000000000000000000000000000111111111000000111111100000000,
0b0000000000000000000000000000000000011111111100000011111110000000,
0b0000000000000000000000000000000000001111111110000001111111000000,
0b0000000000000000000000000000000000000011111111100001111111100000,
0b0000000000000000000000001100000000000001111111110000111111100000,
0b0000000000000000000000001111000000000000111111111000011111110000,
0b0000000000000000000000011111110000000000011111111100001111100000,
0b0000000000000000000000011111111100000000001111111110001111000000,
0b0000000000000000000000111111111111000000000011111111100110000000,
0b0000000000000000000000011111111111110000000001111111110000000000,
0b0000000000000000000000000111111111111100000000111111111000000000,
0b0000000000000000000000000001111111111111100000011111110000000000,
0b0000000000000000000000000000011111111111111000001111100000000000,
0b0000000000000000000000000000000111111111111110000011000000000000,
0b0000000000000000000000000000000001111111111111100000000000000000,
0b0000000000000000000000000000000000001111111111111000000000000000,
0b0000000000000000000000000000000000000011111111111100000000000000,
0b0000000000000000000111000000000000000000111111111100000000000000,
0b0000000000000000000111111110000000000000001111111000000000000000,
0b0000000000000000000111111111111100000000000011111000000000000000,
0b0000000000000000000111111111111111110000000000110000000000000000,
0b0000000000000000001111111111111111111111100000000000000000000000,
0b0000000000000000001111111111111111111111111111000000000000000000,
0b0000000000000000000000011111111111111111111111100000000000000000,
0b0000001111110000000000000001111111111111111111100000111111000000,
0b0000001111110000000000000000000011111111111111100000111111000000,
0b0000001111110000000000000000000000000111111111100000111111000000,
0b0000001111110000000000000000000000000000001111000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000001111111111111111111111111111000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111110000000000000000000000000000000000000000111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
0b0000001111111111111111111111111111111111111111111111111111000000,
};

We then call the transpose32 function and print the resulting bit pattern:

#include <stdio.h>

void
printbits(uint64_t a[64]) {
  int i, j;

  for (i = 0; i < 64; i++) {
    for (j = 63; j >= 0; j--)
      printf("%c", (a[i] >> j) & 1 ? '1' : '0');
    printf("\n");
  }
}

int
main() {
  transpose64(logo);
  printbits(logo);
  return 0;
}

And this then gives as output:

0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000011111111111111111111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000000000000000000000000111111
0000000000000000000000000000000000000011000000011111100000111111
0000000000000000000000000000000000111111000000011111100000111111
0000000000000000000000000000000000111111000000011111100000111111
0000000000000000000000000000000000111111000000011111100000111111
0000000000000000000000000100000000011111000000011111100000111111
0000000000000000000000011110000000011111100000011111100000111111
0000000000000000000001111110000000011111100000011111100000111111
0000000000000000000001111111000000011111100000011111100000111111
0000000000000000000000111111000000011111100000011111100000111111
0000000000000000000000111111100000001111110000011111100000111111
0000000000000000000000011111100000001111110000011111100000111111
0000000000000100000000011111110000001111110000011111100000111111
0000000000001110000000001111110000001111110000011111100000111111
0000000000011110000000001111111000001111110000011111100000111111
0000000001111111000000000111111000000111111000011111100000111111
0000000000111111100000000111111100000111111000011111100000111111
0000000000111111110000000011111100000111111000011111100000111111
0000000000011111111000000011111100000111111000011111100000111111
0000000000001111111100000001111110000011111000011111100000111111
0000000000000111111100000001111110000011111100011111100000111111
0000000000000011111110000000111111000011111100011111100000111111
0001000000000001111111000000111111000011111100011111100000111111
0011110000000001111111100000111111100011111100011111100000111111
0111111000000000111111110000011111100001111100011111100000111111
0111111110000000011111111000011111110001111110011111100000111111
1111111111000000001111111000001111110001111110011111100000111111
0011111111100000000111111100001111111001111110011111100000111111
0001111111111000000011111110000111111001111110011111100000111111
0000111111111100000011111111000111111100111100000000000000111111
0000001111111110000001111111100011111100000000000000000000111111
0000000111111111100000111111110011111000000000000000000000111111
0000000011111111110000011111110001100000000000000000000000111111
0000000000111111111000001111111000000000000000000000000000111111
0000000000011111111110000111111000000000000000000000000000111111
0000000000001111111111000111110000000000011111111111111111111111
0000000000000011111111100011100000000000011111111111111111111111
0000000000000001111111111001000000000000011111111111111111111111
0000000000000000111111111100000000000000011111111111111111111111
0000000000000000001111111100000000000000011111111111111111111111
0000000000000000000111111000000000000000011111111111111111111111
0000000000000000000011110000000000000000000000000000000000000000
0000000000000000000000100000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000

Which is nicely flipped, as we hoped for.

Edit:

This is actually not really what you asked for, as you asked for a non-destructive version of this code. You can get this by having the first swap of the 32x32 blocks go from x to y. For instance, you might do something like:

void
non_destructive_transpose64(uint64_t x[64], uint64_t y[64]) {
  int j, k;
  uint64_t m, t;

  for (k = 0; k < 64; k += 2) {
    ((uint32_t *) y)[k] = ((uint32_t *) x)[k ^ 64 + 1];
    ((uint32_t *) y)[k + 1] = ((uint32_t *) x)[k + 1];
  }
  for (; k < 128; k += 2) {
    ((uint32_t *) y)[k] = ((uint32_t *) x)[k];
    ((uint32_t *) y)[k + 1] = ((uint32_t *) x)[k ^ 64];
  }
  for (j = 16, m = 0x0000FFFF0000FFFF; j; j >>= 1, m ^= m << j) {
    for (k = 0; k < 64; k = ((k | j) + 1) & ~j) {
      t = (y[k] ^ (y[k | j] >> j)) & m;
      y[k] ^= t;
      y[k | j] ^= (t << j);
    }
  }
}

Unlike the other version of the code this does not work regardless of endianness of the architecture. Also, I know that the C standard does not allow you to access an array of uint64_t as an array of uint32_t. However, I like it that no shifts or xors are needed for the first iteration of the move-the-blocks-around loop when you do it like this.

139

answered Sep 29 '22 12:09

Freek Wiedijk

Related questions
                            
                                how to add to settings.gradle in cordova
                            
                                Class Members -- Java vs. Python
                            
                                MethodHandles or LambdaMetafactory?
                            
                                Why a Java file exists only in its canonical form?
                            
                                Thread priority - 'unit test'
                            
                                Find out which resources are not loaded successfully with Selenium
                            
                                ChildFragmentManager java.lang.IllegalStateException: Activity has been destroyed
                            
                                ViewRootImpl.setPausedForTransition(boolean) NullPointerException in ActivityTransitionCoordinator when transition to other Activity invoked too early
                            
                                How to register eureka-clients with eureka-server on different hosts. Spring-boot
                            
                                Spring boot and Database testing with connection pool
                            
                                How to use Jetty with Let's Encrypt certificates?
                            
                                Spring Boot on Android?
                            
                                Pass a field to custom deserializer class Jackson
                            
                                How to sort PageRequest on String as numeric value
                            
                                How to manually call `onUpdate` from within the widget itself
                            
                                How to implement an active & standby queue job-processing system in JeroMQ?
                            
                                What are the alternatives to using an ORDER BY in a Subquery in the JPA Criteria API?
                            
                                How to read all JARs from Eclipse classpathentry where kind="con"
                            
                                java 8 invoke dynamic advantage
                            
                                Reconnect a Hazelcast Client

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to efficiently transpose a 2D bit matrix

Tags:

java

performance

arrays

c

bit-manipulation