How can I accelerate a sparse matrix by dense vector product, currently implemented via scipy.sparse.csc

My ultimate goal is to accelerate the computation of a matrix-vector product in Python, potentially by using a CUDA-enabled GPU. The matrix A is about 15k x 15k and sparse (density ~ 0.05), and the vector x is 15k elements and dense, and I am computing Ax. I have to perform this computation many times, so making it as fast as possible would be ideal.

My current non-GPU “optimization” is to represent A as a scipy.sparse.csc_matrix object, and then simply computing A.dot(x), but I was hoping to speed this up on a VM with a couple NVIDIA GPUs attached, and using only Python if possible (i.e. not writing out the detailed kernel functions by hand). I’ve succeeded in accelerating dense matrix-vector products using the cudamat library, but not for the sparse case. There are a handful of suggestions for the sparse case online, such as using pycuda, or scikit-cuda, or anaconda’s accelerate package, but there’s not a ton of information so it’s hard to know where to begin.

I don’t need greatly detailed instructions, but if anyone has solved this before and could provide a “big picture” roadmap for the simplest way of doing this, or has an idea of the sort of speed up a sparse GPU-based matrix-vector product would have over scipy’s sparse algorithms, that would be very helpful.

How are SciPy sparse matrices implemented?

Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.

How do you convert dense to sparse matrix?

A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.

How do SciPy sparse matrices multiply?

We use the multiply() method provided in both csc_matrix and csr_matrix classes to multiply two sparse matrices. We can multiply two matrices of same format( both matrices are csc or csr format) and also of different formats ( one matrix is csc and other is csr format).

Another alternative is to use the CuPy package. It has the same interface as numpy/ scipy (wich is nice) and (for me at least), it turned out to be much easier to install than pycuda. The new code would look something like this:

import cupy as cp
from cupyx.scipy.sparse import csr_matrix as csr_gpu

A = some_sparse_matrix #(scipy.sparse.csr_matrix)
x = some_dense_vector  #(numpy.ndarray)

A_gpu = csr_gpu(A)  #moving A to the gpu
x_gpu = cp.array(x) #moving x to the gpu

for i in range(niter):
    x_gpu = A_gpu.dot(x_gpu)
x = cp.asnumpy(x_gpu) #back to numpy object for fast indexing

How can I accelerate a sparse matrix by dense vector product, currently implemented via scipy.sparse.csc_matrix.dot, using CUDA?

Tags:

python

matrix

cuda

gpu

sparse-matrix

rkp

People also ask

1 Answers

Miguel

Recent Activity

Donate For Us