Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sparse Matrix in Numba

I wish to speed up my machine learning algorithm (written in Python) using Numba (http://numba.pydata.org/). Note that this algorithm takes as its input data a sparse matrix. In my pure Python implementation, I used csr_matrix and related classes from Scipy, but apparently it is not compatible with Numba's JIT compiler.

I have also created my own custom class to implement the sparse matrix (which is basically a list of list of (index, value) pair), but again it is incompatible with Numba (i.e., I got some weird error message saying it doesn't recognize extension type)

Is there an alternative, simple way to implement sparse matrix using only numpy (without resorting to SciPy) that is compatible with Numba? Any example code would be appreciated. Thanks!

like image 702
rjo2909 Avatar asked Oct 17 '13 06:10

rjo2909


People also ask

What is sparse matrix with example?

Sparse matrix is a matrix which contains very few non-zero elements. When a sparse matrix is represented with a 2-dimensional array, we waste a lot of space to represent that matrix. For example, consider a matrix of size 100 X 100 containing only 10 non-zero elements.

Does Numba support matrix multiplication?

Matrix MultiplicationsMatrix multiplication is another example that shows how Numba could be useful to boost up the processing time.

What is sparse matrix used for?

Using sparse matrices to store data that contains a large number of zero-valued elements can both save a significant amount of memory and speed up the processing of that data. sparse is an attribute that you can assign to any two-dimensional MATLAB® matrix that is composed of double or logical elements.

What is sparse matrix in python?

Matrices that mostly contain zeroes are said to be sparse. Sparse matrices are commonly used in applied machine learning (such as in data containing data-encodings that map categories to count) and even in whole subfields of machine learning such as natural language processing (NLP).


2 Answers

If all you have to do is iterate over the values of a CSR matrix, you can pass the attributes data, indptr, and indices to a function instead of the CSR matrix object.

from scipy import sparse
from numba import njit

@njit
def print_csr(A, iA, jA):
    for row in range(len(iA)-1):
        for i in range(iA[row], iA[row+1]):
            print(row, jA[i], A[i])

A = sparse.csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
print_csr(A.data, A.indptr, A.indices)
like image 197
slek120 Avatar answered Sep 18 '22 12:09

slek120


You can access the data of your sparse matrix as pure numpy or python. For example

M=sparse.csr_matrix([[1,0,0],[1,0,1],[1,1,1]])
ML = M.tolil()

for d,r in enumerate(zip(ML.data,ML.rows))
    # d,r are lists
    dr = np.array([d,r])
    print dr

produces:

[[1]
 [0]]
[[1 1]
 [0 2]]
[[1 1 1]
 [0 1 2]]

Surely numba can handle code that uses these arrays, provided, of course, that it does not expect each row to have the same size of array.


The lil format stores values 2 object dtype arrays, with data and indices stored lists, by row.

like image 33
hpaulj Avatar answered Sep 19 '22 12:09

hpaulj