Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

non-square C-order matrices in cuBLAS ( numba )

I'm trying to use the cuBLAS functions in Anaconda's Numba package and having an issue. I need the input matrices to be in C-order. The output can be in Fortran order.

I can run the example script provided with the package, here. The script has two functions, gemm_v1 and gemm_v2. In gemm_v1, the user has to create the input matrices in Fortran order. In gemm_v2, they can be passed to the cuda implementation of GEMM and transposed on the device. I can get these examples to work with square matrices. However, I can't figure out how to get gemm_v2 to work with non-square input matrices. Is there a way to work with C-order input matrices that are non-square?

Note:
Ideally, both the input and output matrices would stay on the device after the call to GEMM to be used in other calculations ( this is part of an iterative method ).

like image 619
user1554752 Avatar asked Jul 25 '17 15:07

user1554752


Video Answer


1 Answers

The problem with this example is, that it only works for square matrices. If matrices are not square you cannot calculate A^t*B^t because of dimension missmatch (assuming the dimensions were right for A*B).

I don't have a working cuBLAS-installation at hand, so it is kind of a shot in the dark, but I would be really surprised if cuBLAS would work differently than the usual BLAS. BLAS expects matrices to be in column-major-order (aka Fortran-order) but can also be used for matrices in row-major-order (aka C-order).

In my opinion, which might be completely wrong, gemm_v2 is not the usual/best way to handle multiplication of two C-order matrices, for example because if one multiplies two C-order matrices one would also have a C-order matrix as answer.

The trick to calculate product of two C-order-matrices with help of gemm would work as follows:

Even if it is probably known to you, I would like first to elaborate on the row-major-order (c-memory-layout) and the column-major-order (fortran-memory-layout), in order to flesh out my answer.

So if we have a 2x3 (i.e. 2 rows and 3 columns) matrix A, and store it in some continuous memory we get:

row-major-order(A) = A11, A12, A13, A21, A22, A23
col-major-order(A) = A11, A21, A12, A22, A13, A33

That means if we get a continuous memory, which represents a matrix in the row-major-order, and interpret it as a matrix in column-major-order we will get quite a different matrix!

However, if we take a look at the transposed matrix A^t we can easily see:

row-major-order(A) = col-major-order(A^t)
col-major-order(A) = row-major-order(A^t)

That means, if we would like to get the matrix C in row-major-order as result, the blas-routine should write the transposed matrix C in column-major-order (after all this we cannot change) into this very memory. However, C^t=(AB)^t=B^t*A^t and B^t an A^t are the original matrices reinterpreted in column-major-order.

Now, let A be a n x k-matrix and B a k x m-matrix, the call of gemm routine should be as follows:

gemm('N', 'N', m, n, k, 1.0, B, m, A, k, 0.0, C, m)

Please note:

  1. We don't have to transpose matrices A and B, because it is handled by reinterpreting C-order as Fortran-order.
  2. We have to swap the places of matrices A and B in order to get C^t in Fortran-order as result.
  3. The resulting matrix C is in C-order (by reinterpreting it from Fortran-order to C-order we get rid of ^t).
like image 94
ead Avatar answered Oct 13 '22 16:10

ead