In the middle of a c++ code, eclipse, I need to compute the multiply of matrices A and B, with the size of 2400*3600 (so the dimensions are not the same). The matrices are stored in float two dimensional arrays.They are not sparse, no limitations.
Each multiply takes so long (several minutes), and I seriously need to reduce it, because I have a loop which repeats 50 million times. and each time a new A and B should be multiplied. Any sort of recommendation is welcomed to reduce the time complexity. (even changing the structure of storing the data, if you think that might help). For example, what if I store the data into one dimensional arrays? Or using vectors instead of arrays?
In one particular case, the first column is always 1, and the values are either 1, -1, or zero. Any idea for this case?
In other cases, the values can be any thing.
** one of these multiplications is X multiplied by its transposed. Is there any recommendation on this specific one?
I wouldn't fool around trying to write my own: Google for LAPACK or BLAS, two time-tested packages for numerical computing, both optimized to the Nth degree. Both have C APIs you can use.
It will definitely help to store your second matrix transposed, so that columns match up with cache lines instead of rows. The difference in access time between L2 cache and main memory is a factor of 10 or so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With