For a class, a question that was posed by my teacher was the algorithmic cost of multiplying a matrix times its transpose. With the standard 3 loop matrix multiplication algorithm, the efficiency is O(N^3), and I wonder if there was a way to manipulate or take advantage of matrix * matrix transpose to get a faster algorithm. I understand that when you multiply a matrix by its transpose you have to calculate less of the matrix because its symmetrical, but I can't think of how to manipulate an algorithm that could take less than O(n^3).
i know there's algorithms like Coppensmith and Straussen that are faster general matrix multiplication algorithms but could anyone give any hints or insights on how to computationally take advantage of the transpose?
Thanks
In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although the naive algorithm is often better for smaller matrices.
This is a major open question in theoretical computer science. As of December 2020, the matrix multiplication algorithm with best asymptotic complexity runs in O(n2.3728596) time, given by Josh Alman and Virginia Vassilevska Williams.
Transpose of a matrix is obtained by changing rows to columns and columns to rows. In other words, transpose of A[N][M] is obtained by changing A[i][j] to A[j][i].
The algorithm has O(n) time complexity. The algorithm uses matrix-matrix multiply-add (MMA) operation for transposing the matrix. We show how to align data and give algorithm for generating permutation matrices. The entire n x n matrix transposition is carried out in 5n time-steps.
As of right now there aren't any aymptotic barrier-breaking properties of this particular multiplication.
The obvious optimization is to take advantage of the symmetry of the product. That is to say, the [i][j]
th entry is equal to the [j][i]
th entry.
For implementation-specific optimizations, there is a significant amount of caching that you can do. A very significant amount of time in the multiplication of large matrices is spent transferring data to and from memory and CPU. So CPU designers implemented a smart caching system whereby recently used memory is stored in a small memory section called the cache. In addition to that, they also made it so that nearby memory is also cached. This is because a lot of the memory IO is due to reading/writing from/to arrays, which are stored sequentially.
Since the transpose of a matrix is simply the same matrix with the indices swapped, caching a value in the matrix can have over twice the impact.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With