My ultimate goal is to accelerate the computation of a matrix-vector product in Python, potentially by using a CUDA-enabled GPU. The matrix A is about 15k x 15k and sparse (density ~ 0.05), and the vector x is 15k elements and dense, and I am computing Ax. I have to perform this computation many times, so making it as fast as possible would be ideal.
My current non-GPU “optimization” is to represent A as a scipy.sparse.csc_matrix object, and then simply computing A.dot(x), but I was hoping to speed this up on a VM with a couple NVIDIA GPUs attached, and using only Python if possible (i.e. not writing out the detailed kernel functions by hand). I’ve succeeded in accelerating dense matrix-vector products using the cudamat library, but not for the sparse case. There are a handful of suggestions for the sparse case online, such as using pycuda, or scikit-cuda, or anaconda’s accelerate package, but there’s not a ton of information so it’s hard to know where to begin.
I don’t need greatly detailed instructions, but if anyone has solved this before and could provide a “big picture” roadmap for the simplest way of doing this, or has an idea of the sort of speed up a sparse GPU-based matrix-vector product would have over scipy’s sparse algorithms, that would be very helpful.
Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.
A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.
We use the multiply() method provided in both csc_matrix and csr_matrix classes to multiply two sparse matrices. We can multiply two matrices of same format( both matrices are csc or csr format) and also of different formats ( one matrix is csc and other is csr format).
Another alternative is to use the CuPy
package. It has the same interface as numpy/ scipy (wich is nice) and (for me at least), it turned out to be much easier to install than pycuda
.
The new code would look something like this:
import cupy as cp
from cupyx.scipy.sparse import csr_matrix as csr_gpu
A = some_sparse_matrix #(scipy.sparse.csr_matrix)
x = some_dense_vector #(numpy.ndarray)
A_gpu = csr_gpu(A) #moving A to the gpu
x_gpu = cp.array(x) #moving x to the gpu
for i in range(niter):
x_gpu = A_gpu.dot(x_gpu)
x = cp.asnumpy(x_gpu) #back to numpy object for fast indexing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With