Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transpose a matrix in an optimal way using blas?

Tags:

c

cuda

blas

cublas

I'm doing some calculations, and doing some analysis on the forces and weakness of different BLAS implementations. however I have come across a problem.

I'm testing cuBlas, doing linAlg on the GPU would seem like a good idea, but there is one problem.

The cuBlas implementation using column-major format, and since this is not what I need in the end, I'm curious if there is a way in with one can make BLAS do matrix-transpose?

like image 715
Martin Kristiansen Avatar asked Oct 16 '11 13:10

Martin Kristiansen


People also ask

How do you transpose a matrix quickly?

To calculate the transpose of a matrix, simply interchange the rows and columns of the matrix i.e. write the elements of the rows as columns and write the elements of a column as rows.

How do you transpose a non square matrix?

Question 4: Can you transpose a non-square matrix? Answer: Yes, you can transpose a non-square matrix. However, you just have to make sure that the number of rows in mat2 must match the number of columns in the mat and vice versa. In other words, if the mat is an NxM matrix, then mat2 must come out as an MxN matrix.

What is transpose of a matrix in C++?

A matrix is a rectangular array of numbers that is arranged in the form of rows and columns. A transpose of a matrix is a new matrix in which the rows of the original are the columns now and vice versa.


1 Answers

BLAS doesn't have a matrix transpose routine built in. The CUDA SDK includes a matrix transpose example with a paper which discusses optimal strategy for performing a transpose. Your best strategy is probably to use row major inputs to CUBLAS with the transpose input version of the calls, then perform the intermediate calculations in column major, and lastly perform a transpose operation afterwards using the SDK transpose kernel.


Edited to add that CUBLAS added a transpose routine in CUBLAS version 5, geam, which can performed matrix transposition in GPU memory and should be regarded as optimal for whatever architecture you are using.

like image 178
talonmies Avatar answered Sep 16 '22 21:09

talonmies