Say I have a matrix with a dimension of A*B
on GPU, where B
(number of columns) is the leading dimension assuming a C style. Is there any method in CUDA (or cublas) to transpose this matrix to FORTRAN style, where A
(number of rows) becomes the leading dimension?
It is even better if it could be transposed during host->device
transfer while keep the original data unchanged.
The cuBLAS Library provides a GPU-accelerated implementation of the basic linear algebra subroutines (BLAS). cuBLAS accelerates AI and HPC applications with drop-in industry standard BLAS APIs highly optimized for NVIDIA GPUs.
NumPy Matrix transpose() We can use the transpose() function to get the transpose of an array.
To transpose a matrix in Python, FOR loop is used. Each element of the matrix is iterated and placed at the respective place in the transpose. The placing is done such that the element at the ith row and jth column is placed at the jth row and ith column.
as asked within the title, to transpose a device row-major matrix A[m][n], one can do it this way:
float* clone = ...;//copy content of A to clone
float const alpha(1.0);
float const beta(0.0);
cublasHandle_t handle;
cublasCreate(&handle);
cublasSgeam( handle, CUBLAS_OP_T, CUBLAS_OP_N, m, n, &alpha, clone, n, &beta, clone, m, A, m );
cublasDestroy(handle);
And, to multiply two row-major matrices A[m][k] B[k][n], C=A*B
cublasSgemm( handle, CUBLAS_OP_N, CUBLAS_OP_N, n, m, k, &alpha, B, n, A, k, &beta, C, n );
where C is also a row-major matrix.
The CUDA SDK includes a matrix transpose, you can see here examples of code on how to implement one, ranging from a naive implementation to optimized versions.
For example:
Naïve transpose
__global__ void transposeNaive(float *odata, float* idata,
int width, int height, int nreps)
{
int xIndex = blockIdx.x*TILE_DIM + threadIdx.x;
int yIndex = blockIdx.y*TILE_DIM + threadIdx.y;
int index_in = xIndex + width * yIndex;
int index_out = yIndex + height * xIndex;
for (int r=0; r < nreps; r++)
{
for (int i=0; i<TILE_DIM; i+=BLOCK_ROWS)
{
odata[index_out+i] = idata[index_in+i*width];
}
}
}
Like talonmies had point out you can specify if you want operate the matrix as transposed or not, in cublas matrix operations eg.: for cublasDgemm() where C = a * op(A) * op(B) + b * C, assuming you want to operate A as transposed (A^T), on the parameters you can specify if it is ('N' normal or 'T' transposed)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With