Fast multiplication of k x k boolean matrices, where 8

Question

I want to find an as fast as possible way of multiplying two small boolean matrices, where small means, 8x8, 9x9 ... 16x16. This routine will be used a lot, so it needs to be very efficient, so please don't suggest that the straightforward solution should be fast enough.

For the special cases 8x8, and 16x16 I already have fairly efficient implementations, based on the solution found here, where we treat the entire matrix as an uint64_t or uint64_t[4] respectively. On my machine this is roughly 70-80 times faster than the straightforward implementation.

However, in the case of 8 < k < 16, I don't really know how I can leverage any reasonable representation in order to enable such clever tricks as above.

So basically, I'm open for any suggestions using any kind of representation (of the matrices) and function signature. You may assume that this targets either a 32-bit or 64-bit architecture (pick what best suits your suggestion)

Aki Suihkonen · Accepted Answer

Given two 4x4 matrices a= 0010,0100,1111,0001, b=1100,0001,0100,0100, one could first calculate the transpose b' = 1000,1011,0000,0100.

Then the resulting matrix M(i,j)=a x b mod 2 == popcount(a[i]&b[j]) & 1; // or parity

From that one can notice that the complexity only grows in n^2, as long as the bitvector fits a computer word.

This can be speed up for 8x8 matrices at least, provided that some special permutation and bit selection operations are available. One can iterate exactly N times with NxN bits in a vector. (so 16x16 is pretty much the limit).

Each step consists of accumulating i.e. Result(n+1) = Result(n) XOR A(n) .& B(n), where Result(0) = 0, A(n) is A <<< n, and '<<<' == columnwise rotation of elements and where B(n) copies diagonal elements from the matrix B:

    a b c          a e i          d h c          g b f
B=  d e f  B(0) =  a e i  B(1) =  d h c   B(2) = g b f
    g h i          a e i          d h c          g b f

And after thinking it a bit further, a better option is to ^^^ (row wise rotate) matrix B and select A(n) == column copied diagonals from A:

    a b c         a a a           b b b           c c c 
A=  d e f  A(0) = e e e , A(1) =  f f f,  A(2) =  d d d 
    g h i         i i i           g g g           h h h

EDIT To benefit later readers, I'd propose the full solution for W<=16 bit matrix multiplications in portable C.

#include <stdint.h>
void matrix_mul_gf2(uint16_t *a, uint16_t *b, uint16_t *c)
{
    // these arrays can be read in two successive xmm registers or in a single ymm
    uint16_t D[16];      // Temporary
    uint16_t C[16]={0};  // result
    uint16_t B[16];  
    uint16_t A[16];
    int i,j;
    uint16_t top_row;
    // Preprocess B (while reading from input) 
    // -- "un-tilt" the diagonal to bit position 0x8000
    for (i=0;i<W;i++) B[i]=(b[i]<<i) | (b[i]>>(W-i));
    for (i=0;i<W;i++) A[i]=a[i];  // Just read in matrix 'a'
    // Loop W times
    // Can be parallelized 4x with MMX, 8x with XMM and 16x with YMM instructions
    for (j=0;j<W;j++) {
        for (i=0;i<W;i++) D[i]=((int16_t)B[i])>>15;  // copy sign bit to rows
        for (i=0;i<W;i++) B[i]<<=1;                  // Prepare B for next round
        for (i=0;i<W;i++) C[i]^= A[i]&D[i];          // Add the partial product

        top_row=A[0];
        for (i=0;i<W-1;i++) A[i]=A[i+1];
        A[W-1]=top_row;
    }
    for (i=0;i<W;i++) c[i]=C[i];      // return result
}

sheu · Answer

How about padding it out to the next "clever" (e.g. 8 or 16) size, with all '1' on the diagonal?

Fast multiplication of k x k boolean matrices, where 8 <= k <= 16

Tags:

c

optimization

matrix-multiplication

hakoja

Video Answer

2 Answers

Aki Suihkonen

sheu

Recent Activity

Donate For Us