#Abstract
Hi, suppose you have two different independent 64-bit binary matrices A
and T
(T
is another matrix that is stored in transposed form, using the transposed version of matrix allows during multiplication to operate on T
's rows rather than columns which is super cool for binary arithmetic) and you want to multiply these matrices the only thing is that matrix multiplication result is truncated to 64-bits and if you yield to a value greater that 1
in some specific matrix cell the resulting matrix cell will contain 1
otherwise 0
#Example
A T
00000001 01111101
01010100 01100101
10010111 00010100
10110000 00011000 <-- This matrix is transposed
11000100 00111110
10000011 10101111
11110101 11000100
10100000 01100010
Binary and traditional multiplication results:
Binary Traditional
11000100 11000100
11111111 32212121
11111111 32213421
11111111 21112211
11101111 22101231
11001111 11001311
11111111 54213432
11001111 11001211
#Question
How do you multiply these matrices in a way described above in most efficient matter?
#P.S
I was trying to take advantage of binary and
(i.e. &
operator) instead of performing multiplication on separate bits, in that case I had to prepare data for multiplication:
ulong u;
u = T & 0xFF;
u = (u << 00) + (u << 08) + (u << 16) + (u << 24)
+ (u << 32) + (u << 40) + (u << 48) + (u << 56);
now by performing binary and
over two integers A
and u
it would yield to the following:
A u R C
00000001 01111101 00000001 1
01010100 01111101 01010100 3
10010111 01111101 00010101 3
10110000 01111101 00110000 2
11000100 01111101 01000100 2
10000011 01111101 00000001 1
11110101 01111101 01110101 5
10100000 01111101 00100000 1
In the example above R
contains result of multiplication of A
bits to u
bits and to obtain the final value we must sum
all bits in a row. Notice that column C
contains values equal to ones found in first column of resulting Traditional
matrix multiplication above. The problem is that during this step I have to operate on a separate bits which I think is sub-optimal approach, I've read through http://graphics.stanford.edu/~seander/bithacks.html looking for a way to do that on parallel but no luck, if anyone has any idea on how to "flatten" and "merge" the values located in R
column into resulting 64-bit matrix, I would appreciate if you drop me several lines,
Thank you,
#Edit
With big thank you to David Eisenstat, the final algorithm would then look like:
var A = ...;
var T = ...; // T == transpose(t), t is original matrix, algorithm works with transposed matrix
var D = 0x8040201008040201UL;
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D); T = (T << 8) | (T >> 56); D = (D << 8) | (D >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & D);
The following piece of code:
public static void Main (string[] args){
ulong U;
var Random = new Xor128 ();
var timer = DateTime.Now;
var A = Random.As<IUniformRandom<UInt64>>().Evaluate();
var T = Random.As<IUniformRandom<UInt64>>().Evaluate();
var steps = 10000000;
for (var i = 0; i < steps; i++) {
ulong r = 0;
var d = 0x8040201008040201UL;
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d); T = (T << 8) | (T >> 56); d = (d << 8) | (d >> 56);
U = A & T; U |= U >> 1; U |= U >> 2; U |= U >> 4; U &= 0x0101010101010101UL; U = (U << 8) - U; r |= (U & d);
}
Console.WriteLine (DateTime.Now - timer);
var m1 = new Int32[8,8];
var m2 = new Int32[8,8];
var m3 = new Int32[8,8];
for (int row = 0; row < 8; row++) {
for (int col = 0; col < 8; col++) {
m1 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
m2 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
m3 [row, col] = Random.As<IUniformRandom<Int32>> ().Evaluate(0, 1);
}
}
timer = DateTime.Now;
for (int i = 0; i < steps; i++) {
for (int row = 0; row < 8; row++) {
for (int col = 0; col < 8; col++) {
var sum = 0;
for (int temp = 0; temp < 8; temp++) {
sum += m1 [row, temp] * m2 [temp, row];
}
m3 [row, col] = sum;
}
}
}
Console.WriteLine (DateTime.Now - timer);
}
Shows me the following results:
00:00:02.4035870
00:00:57.5147150
And that's a 23x performance improvement under Mac OS X / Mono, thanks everyone
Because MATLAB is a programming language at first developed for numerical linear algebra (matrix manipulations), which has libraries especially developed for matrix multiplications.
In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices.
You can only multiply two matrices if their dimensions are compatible , which means the number of columns in the first matrix is the same as the number of rows in the second matrix.
I'm not sure about most efficient, but here's something to try. The following sequence of instructions computes the main diagonal of the product A * T'. Rotate both T and D by 8 bits and repeat for 7 more iterations.
// uint64_t A, T;
uint64_t D = UINT64_C(0x8040201008040201);
uint64_t P = A & T;
// test whether each byte is nonzero
P |= P >> 1;
P |= P >> 2;
P |= P >> 4;
P &= UINT64_C(0x0101010101010101);
// fill each nonzero byte with ones
P *= 255; // or P = (P << 8) - P;
// leave only the current diagonal
P &= D;
If you are looking for a way to do dense matrix multiplication in parallel, partition your result matrix into blocks and compute each block in parallel.
http://en.wikipedia.org/wiki/Block_matrix#Block_matrix_multiplication
It is not clear what data structure you are using, which language (yes, I know you said 'any language'), and what you are trying to optimize (speed? memory?) etc. All of these may have profound impact on your solution.
Some examples:
|
) instead of +
. Some languages may lazy evaluate this, stopping at the first '1' they encounter.BTW, I'm assuming you have a lot of matrices to process, otherwise I would use a direct, and readable code. My guess is that even with a lot of matrices, the gain in performance would be negligible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With