I need to perform a set of operations on a scipy sparse matrix in a Cython
method.
To efficiently apply these I need access to lil_matrix
representation.
The lil (linked-list sparse matrix) data representation in python uses list of lists
with different lengths.
How can I efficiently pass a list of list of different length to cython (without copying)? Is there any other way to access lil-matrices in cython?
The example below iterates over a lil_matrix
and calculates the sum for each row.
Note I am doing no declarations and even though it is extremely fast because Cython is already optimized for built-in types such as lists. The timings are also shown below...
import time
import numpy as np
cimport numpy as np
from scipy.sparse import lil_matrix
cdef iter_over_lil_matrix(m):
cdef list sums, data_row
sums = []
for data_row in m.data:
s = 0
for value in data_row:
s += value
sums.append(s)
return sums
def main():
a = np.random.random((1e4*1e4))
a[a>0.1] = 0
a = a.reshape(1e4,1e4)
m = lil_matrix(a)
t0 = time.clock()
sums = iter_over_lil_matrix(m)
t1 = time.clock()
print 'Cython lil_matrix Time', t1-t0
t0 = time.clock()
array_sums = a.sum(axis=1)
t1 = time.clock()
print 'Numpy ndarray Time', t1-t0
t0 = time.clock()
lil_sums = m.sum(axis=1)
t1 = time.clock()
print 'lil_matrix Time', t1-t0
mcsr = m.tocsr()
t0 = time.clock()
csr_sums = mcsr.sum(axis=1)
t1 = time.clock()
print 'csr_matrix Time', t1-t0
assert np.allclose(array_sums, sums)
assert np.allclose(array_sums, np.asarray(lil_sums).flatten())
assert np.allclose(array_sums, np.asarray(csr_sums).flatten())
Timings in seconds - only about 2 times slower than the super-optimized NumPy :D, much faster than the lil_matrix.sum()
method because it converts to csr_matrix()
before, as clarified by @hpaulj and confirmed by the results below. Note that the csr_matrix.sum()
over the columns is almost one order of magnitude faster than the dense sum.
Cython lil_matrix Time 0.183935034665
Numpy ndarray Time 0.106583238273
lil_matrix Time 2.47158218631
csr_matrix Time 0.0140050888745
Things that will slow down the code:
for i in range(len(m.data)):
with data_row = m.data[i]
np.ndarray[object, ndim=1] data
with data=m.data
Things that did not affect:
boundscheck
or wraparound
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With