I'm doing some calculations, and doing some analysis on the forces and weakness of different BLAS implementations. however I have come across a problem.
I'm testing cuBlas, doing linAlg on the GPU would seem like a good idea, but there is one problem.
The cuBlas implementation using column-major format, and since this is not what I need in the end, I'm curious if there is a way in with one can make BLAS do matrix-transpose?
To calculate the transpose of a matrix, simply interchange the rows and columns of the matrix i.e. write the elements of the rows as columns and write the elements of a column as rows.
Question 4: Can you transpose a non-square matrix? Answer: Yes, you can transpose a non-square matrix. However, you just have to make sure that the number of rows in mat2 must match the number of columns in the mat and vice versa. In other words, if the mat is an NxM matrix, then mat2 must come out as an MxN matrix.
A matrix is a rectangular array of numbers that is arranged in the form of rows and columns. A transpose of a matrix is a new matrix in which the rows of the original are the columns now and vice versa.
BLAS doesn't have a matrix transpose routine built in. The CUDA SDK includes a matrix transpose example with a paper which discusses optimal strategy for performing a transpose. Your best strategy is probably to use row major inputs to CUBLAS with the transpose input version of the calls, then perform the intermediate calculations in column major, and lastly perform a transpose operation afterwards using the SDK transpose kernel.
Edited to add that CUBLAS added a transpose routine in CUBLAS version 5, geam
, which can performed matrix transposition in GPU memory and should be regarded as optimal for whatever architecture you are using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With