The naive algorithm for multiplying 4x4 matrices looks like this:
void matrix_mul(double out[4][4], double lhs[4][4], double rhs[4][4]) {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out[i][j] = 0.0;
for (int k = 0; k < 4; ++k) {
out[i][j] += lhs[i][k] * rhs[k][j];
}
}
}
}
Obviously, this algorithm gives bogus results if out == lhs
or out == rhs
(here ==
means reference equality). Is there a version that allows one or both of those cases that doesn't simply copy the matrix? I'm happy to have different functions for each case if necessary.
I found this paper but it discusses the Strassen-Winograd algorithm which is overkill for my small matrices. The answers to this question seem to indicate that if out == lhs && out == rhs
(i.e., we're attempting to square the matrix), then it can't be done in place, but even there there's no convincing evidence or proof.
In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication.
How to multiply two given matrices? To multiply one matrix with another, we need to check first, if the number of columns of the first matrix is equal to the number of rows of the second matrix. Now multiply each element of the column of the first matrix with each element of rows of the second matrix and add them all.
You can only multiply two matrices if their dimensions are compatible , which means the number of columns in the first matrix is the same as the number of rows in the second matrix.
I'm not thrilled with this answer (I'm posting it mainly to silence the "it obviously can't be done" crowd), but I'm skeptical that it's possible to do much better with a true in-place algorithm (O(1) extra words of storage for multiplying two n x n matrices). Let's call the two matrices to be multplied A and B. Assume that A and B are not aliased.
If A were upper-triangular, then the multiplication problem would look like this.
[a11 a12 a13 a14] [b11 b12 b13 b14]
[ 0 a22 a23 a24] [b21 b22 b23 b24]
[ 0 0 a33 a34] [b31 b32 b33 b34]
[ 0 0 0 a44] [b41 b42 b43 b44]
We can compute the product into B as follows. Multiply the first row of B by a11
. Add a12
times the second row of B to the first. Add a13
times the third row of B to the first. Add a14
times the fourth row of B to the first.
Now, we've overwritten the first row of B with the correct product. Fortunately, we don't need it any more. Multiply the second row of B by a22
. Add a23
times the third row of B to the second. (You get the idea.)
Likewise, if A were unit lower-triangular, then the multiplication problem would look like this.
[ 1 0 0 0 ] [b11 b12 b13 b14]
[a21 1 0 0 ] [b21 b22 b23 b24]
[a31 a32 1 0 ] [b31 b32 b33 b34]
[a41 a42 a43 1 ] [b41 b42 b43 b44]
Add a43
times to third row of B to the fourth. Add a42
times the second row of B to the fourth. Add a41
times the first row of B to the fourth. Add a32
times the second row of B to the third. (You get the idea.)
The complete algorithm is to LU-decompose A in place, multiply U B into B, multiply L B into B, and then LU-undecompose A in place (I'm not sure if anyone ever does this, but it seems easy enough to reverse the steps). There are about a million reasons not to implement this in practice, two being that A may not be LU-decomposable and that A won't be reconstructed exactly in general with floating-point arithmetic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With