Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly pass a scipy.sparse CSR matrix to a cython function?

I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?

like image 457
vgoklani Avatar asked Aug 13 '14 20:08

vgoklani


2 Answers

Here is an example about how to quickly access the data from a coo_matrix using the properties row, col and data. The purpose of the example is just to show how to declare the data types and create the buffers (also adding the compiler directives that will usually give you a considerable boost)...

#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
#cython: nonecheck=False

import numpy as np
from scipy.sparse import coo_matrix
cimport numpy as np

ctypedef np.int32_t cINT32
ctypedef np.double_t cDOUBLE

def print_sparse(m):
    cdef np.ndarray[cINT, ndim=1] row, col
    cdef np.ndarray[cDOUBLE, ndim=1] data
    cdef int i
    if not isinstance(m, coo_matrix):
        m = coo_matrix(m)
    row = m.row.astype(np.int32)
    col = m.col.astype(np.int32)
    data = m.data.astype(np.float64)
    for i in range(np.shape(data)[0]):
        print row[i], col[i], data[i]
like image 115
Saullo G. P. Castro Avatar answered Oct 11 '22 14:10

Saullo G. P. Castro


Building on @SaulloCastro's answer, add this function to the .pyx file to display the attributes of a csr matrix:

def print_csr(m):
    cdef np.ndarray[cINT32, ndim=1] indices, indptr
    cdef np.ndarray[cDOUBLE, ndim=1] data
    cdef int i
    if not isinstance(m, csr_matrix):
        m = csr_matrix(m)
    indices = m.indices.astype(np.int32)
    indptr = m.indptr.astype(np.int32)
    data = m.data.astype(np.float64)
    print indptr
    for i in range(np.shape(data)[0]):
        print indices[i], data[i]

indptr does not have the same length as data, so can't be printed in the same loop.

To display the csr data like coo, you can do your own conversion with these iteration lines:

    for i in range(np.shape(indptr)[0]-1):
        for j in range(indptr[i], indptr[i+1]):
            print i, indices[j], data[j]

I assume you know how to setup and compile a pyx file.

Also, what does your cython function assume about the matrix? Does it know about the csr format? The coo format?

Or does your cython function want a regular numpy array? In that case, we are off on a rabbit trail. You just need to convert the sparse matrix to an array: x.toarray() (or x.A for short).

like image 38
hpaulj Avatar answered Oct 11 '22 13:10

hpaulj