I need to perform a set of operations on a scipy sparse matrix in a Cython method.
To efficiently apply these I need access to lil_matrix representation.
The lil (linked-list sparse matrix) data representation in python uses list of lists with different lengths.
How can I efficiently pass a list of list of different length to cython (without copying)? Is there any other way to access lil-matrices in cython?
The example below iterates over a lil_matrix and calculates the sum for each row. 
Note I am doing no declarations and even though it is extremely fast because Cython is already optimized for built-in types such as lists. The timings are also shown below...
import time
import numpy as np
cimport numpy as np
from scipy.sparse import lil_matrix
cdef iter_over_lil_matrix(m):
    cdef list sums, data_row
    sums = []
    for data_row in m.data:
        s = 0
        for value in data_row:
            s += value
        sums.append(s)
    return sums
def main():
    a = np.random.random((1e4*1e4))
    a[a>0.1] = 0
    a = a.reshape(1e4,1e4)
    m = lil_matrix(a)
    t0 = time.clock()
    sums = iter_over_lil_matrix(m)
    t1 = time.clock()
    print 'Cython lil_matrix Time', t1-t0
    t0 = time.clock()
    array_sums = a.sum(axis=1)
    t1 = time.clock()
    print 'Numpy ndarray Time', t1-t0
    t0 = time.clock()
    lil_sums = m.sum(axis=1)
    t1 = time.clock()
    print 'lil_matrix Time', t1-t0
    mcsr = m.tocsr()
    t0 = time.clock()
    csr_sums = mcsr.sum(axis=1)
    t1 = time.clock()
    print 'csr_matrix Time', t1-t0
    assert np.allclose(array_sums, sums)
    assert np.allclose(array_sums, np.asarray(lil_sums).flatten())
    assert np.allclose(array_sums, np.asarray(csr_sums).flatten())
Timings in seconds - only about 2 times slower than the super-optimized NumPy :D, much faster than the lil_matrix.sum() method because it converts to csr_matrix() before, as clarified by @hpaulj and confirmed by the results below. Note that the csr_matrix.sum() over the columns is almost one order of magnitude faster than the dense sum.
Cython lil_matrix Time 0.183935034665
Numpy ndarray Time 0.106583238273
lil_matrix Time 2.47158218631
csr_matrix Time 0.0140050888745
Things that will slow down the code:
for i in range(len(m.data)): with data_row = m.data[i]
np.ndarray[object, ndim=1] data with data=m.data
Things that did not affect:
boundscheck or wraparound
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With