Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating through a scipy.sparse vector (or matrix)

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following:

from scipy.sparse import lil_matrix  x = lil_matrix( (20,1) ) x[13,0] = 1 x[15,0] = 2  c = 0 for i in x:   print c, i   c = c+1 

the output is

0  1  2  3  4  5  6  7  8  9  10  11  12  13   (0, 0) 1.0 14  15   (0, 0) 2.0 16  17  18  19   

so it appears the iterator is touching every element, not just the nonzero entries. I've had a look at the API

http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html

and searched around a bit, but I can't seem to find a solution that works.

like image 792
RandomGuy Avatar asked Nov 30 '10 21:11

RandomGuy


People also ask

What is a SciPy sparse matrix?

Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.

How do you know if a matrix is sparse Python?

To check whether a matrix is a sparse matrix, we only need to check the total number of elements that are equal to zero. If this count is more than (m * n)/2, we return true.

Is sparse a SciPy?

SciPy has a module, scipy. sparse that provides functions to deal with sparse data. There are primarily two types of sparse matrices that we use: CSC - Compressed Sparse Column.


1 Answers

Edit: bbtrb's method (using coo_matrix) is much faster than my original suggestion, using nonzero. Sven Marnach's suggestion to use itertools.izip also improves the speed. Current fastest is using_tocoo_izip:

import scipy.sparse import random import itertools  def using_nonzero(x):     rows,cols = x.nonzero()     for row,col in zip(rows,cols):         ((row,col), x[row,col])  def using_coo(x):     cx = scipy.sparse.coo_matrix(x)         for i,j,v in zip(cx.row, cx.col, cx.data):         (i,j,v)  def using_tocoo(x):     cx = x.tocoo()         for i,j,v in zip(cx.row, cx.col, cx.data):         (i,j,v)  def using_tocoo_izip(x):     cx = x.tocoo()         for i,j,v in itertools.izip(cx.row, cx.col, cx.data):         (i,j,v)  N=200 x = scipy.sparse.lil_matrix( (N,N) ) for _ in xrange(N):     x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100) 

yields these timeit results:

% python -mtimeit -s'import test' 'test.using_tocoo_izip(test.x)' 1000 loops, best of 3: 670 usec per loop % python -mtimeit -s'import test' 'test.using_tocoo(test.x)' 1000 loops, best of 3: 706 usec per loop % python -mtimeit -s'import test' 'test.using_coo(test.x)' 1000 loops, best of 3: 802 usec per loop % python -mtimeit -s'import test' 'test.using_nonzero(test.x)' 100 loops, best of 3: 5.25 msec per loop 
like image 63
unutbu Avatar answered Sep 21 '22 14:09

unutbu