What is the fastest or, failing that, least wordy way of accessing all non-zero values in a row <code>row</code> or column <code>col</code> of a <code>scipy.sparse</code> matrix <code>A</code> in <code>CSR</code> format? Would doing it in another format (say, <code>COO</code>) be more efficient? Right now, I use the following: <pre class="prettyprint"><code>A[row, A[row, :].nonzero()[1]] </code></pre> or <pre class="prettyprint"><code>A[A[:, col].nonzero()[0], col] </code></pre>

For a problem like this is pays to understand the underlying data structures for the different formats: <pre class="prettyprint"><code>In [672]: A=sparse.csr_matrix(np.arange(24).reshape(4,6)) In [673]: A.data Out[673]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], dtype=int32) In [674]: A.indices Out[674]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32) In [675]: A.indptr Out[675]: array([ 0, 5, 11, 17, 23], dtype=int32) </code></pre> The <code>data</code> values for a row are a slice within <code>A.data</code>, but identifying that slice requires some knowledge of the <code>A.indptr</code> (see below) For the <code>coo</code>. <pre class="prettyprint"><code>In [676]: Ac=A.tocoo() In [677]: Ac.data Out[677]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], dtype=int32) In [678]: Ac.row Out[678]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], dtype=int32) In [679]: Ac.col Out[679]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32) </code></pre> Note that <code>A.nonzeros()</code> converts to <code>coo</code> and returns the <code>row</code> and <code>col</code> attributes (more or less - look at its code). For the <code>lil</code> format; data is stored by row in lists: <pre class="prettyprint"><code>In [680]: Al=A.tolil() In [681]: Al.data Out[681]: array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]], dtype=object) In [682]: Al.rows Out[682]: array([[1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5]], dtype=object) </code></pre> =============== Selecting a row of <code>A</code> works, though in my experience that tends to be a bit slow, in part because it has to create a new <code>csr</code> matrix. Also your expression seems wordier than needed. Looking at my first row which has a 0 element (the others are too dense): <pre class="prettyprint"><code>In [691]: A[0, A[0,:].nonzero()[1]].A Out[691]: array([[1, 2, 3, 4, 5]], dtype=int32) </code></pre> The whole row, expressed as a dense array is: <pre class="prettyprint"><code>In [692]: A[0,:].A Out[692]: array([[0, 1, 2, 3, 4, 5]], dtype=int32) </code></pre> but the <code>data</code> attribute of that row is the same as your selection <pre class="prettyprint"><code>In [693]: A[0,:].data Out[693]: array([1, 2, 3, 4, 5], dtype=int32) </code></pre> and with the <code>lil</code> format <pre class="prettyprint"><code>In [694]: Al.data[0] Out[694]: [1, 2, 3, 4, 5] </code></pre> <code>A[0,:].tocoo()</code> doesn't add anything. Direct access to attributes of a <code>csr</code> and <code>lil</code> isn't that good when picking columns. For that <code>csc</code> is better, or <code>lil</code> of the transpose. Direct access to the <code>csr</code> <code>data</code>, with the aid of <code>indptr</code>, would be: <pre class="prettyprint"><code>In [697]: i=0; A.data[A.indptr[i]:A.indptr[i+1]] Out[697]: array([1, 2, 3, 4, 5], dtype=int32) </code></pre> Calculations using the <code>csr</code> format routinely iterate through <code>indptr</code> like this, getting the values of each row - but they do this in compiled code. A recent related topic, seeking the product of nonzero elements by row: Multiplying column elements of sparse Matrix There I found the <code>reduceat</code> using <code>indptr</code> was quite fast. Another tool when dealing with sparse matrices is multiplication <pre class="prettyprint"><code>In [708]: (sparse.csr_matrix(np.array([1,0,0,0])[None,:])*A) Out[708]: <1x6 sparse matrix of type '<class 'numpy.int32'>' with 5 stored elements in Compressed Sparse Row format> </code></pre> <code>csr</code> actually does <code>sum</code> with this kind of multiplication. And if my memory is correct, it actually performs <code>A[0,:]</code> this way Sparse matrix slicing using list of int

Most efficient way of accessing non-zero values in row/column in scipy.sparse matrix

What is the fastest or, failing that, least wordy way of accessing all non-zero values in a row row or column col of a scipy.sparse matrix A in CSR format?

Would doing it in another format (say, COO) be more efficient?

Right now, I use the following:

A[row, A[row, :].nonzero()[1]]

or

A[A[:, col].nonzero()[0], col]

What are the different types of sparse matrices in SciPy?

Sparse matrices ( scipy.sparse ) Sparse linear algebra ( scipy.sparse.linalg ) Compressed sparse graph routines ( scipy.sparse.csgraph )

Which NumPy functions work on SciPy sparse arrays?

Many linear algebra NumPy and SciPy functions that operate on NumPy arrays can transparently operate on SciPy sparse arrays. Further, machine learning libraries that use NumPy data structures can also operate transparently on SciPy sparse arrays, such as scikit-learn for general machine learning and Keras for deep learning.

How do you represent a sparse matrix in Python?

Sparse Matrix Representations can be done in many ways following are two common representations: Array representation. Linked list representation. Method 1: Using Arrays: 2D array is used to represent a sparse matrix in which there are three rows named as. Row: Index of row, where non-zero element is located.

What is the problem with sparse matrices?

In both cases, the matrix contained is sparse with many more zero values than data values. The problem with representing these sparse matrices as dense matrices is that memory is required and must be allocated for each 32-bit or even 64-bit zero value in the matrix.

For a problem like this is pays to understand the underlying data structures for the different formats:

In [672]: A=sparse.csr_matrix(np.arange(24).reshape(4,6))
In [673]: A.data
Out[673]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [674]: A.indices
Out[674]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)
In [675]: A.indptr
Out[675]: array([ 0,  5, 11, 17, 23], dtype=int32)

The data values for a row are a slice within A.data, but identifying that slice requires some knowledge of the A.indptr (see below)

For the coo.

In [676]: Ac=A.tocoo()
In [677]: Ac.data
Out[677]: 
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23], dtype=int32)
In [678]: Ac.row
Out[678]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3], dtype=int32)
In [679]: Ac.col
Out[679]: array([1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5], dtype=int32)

Note that A.nonzeros() converts to coo and returns the row and col attributes (more or less - look at its code).

For the lil format; data is stored by row in lists:

In [680]: Al=A.tolil()
In [681]: Al.data
Out[681]: 
array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]], dtype=object)
In [682]: Al.rows
Out[682]: 
array([[1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5], [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5]], dtype=object)

===============

Selecting a row of A works, though in my experience that tends to be a bit slow, in part because it has to create a new csr matrix. Also your expression seems wordier than needed.

Looking at my first row which has a 0 element (the others are too dense):

In [691]: A[0, A[0,:].nonzero()[1]].A
Out[691]: array([[1, 2, 3, 4, 5]], dtype=int32)

The whole row, expressed as a dense array is:

In [692]: A[0,:].A
Out[692]: array([[0, 1, 2, 3, 4, 5]], dtype=int32)

but the data attribute of that row is the same as your selection

In [693]: A[0,:].data
Out[693]: array([1, 2, 3, 4, 5], dtype=int32)

and with the lil format

In [694]: Al.data[0]
Out[694]: [1, 2, 3, 4, 5]

A[0,:].tocoo() doesn't add anything.

Direct access to attributes of a csr and lil isn't that good when picking columns. For that csc is better, or lil of the transpose.

Direct access to the csr data, with the aid of indptr, would be:

In [697]: i=0; A.data[A.indptr[i]:A.indptr[i+1]]
Out[697]: array([1, 2, 3, 4, 5], dtype=int32)

Calculations using the csr format routinely iterate through indptr like this, getting the values of each row - but they do this in compiled code.

A recent related topic, seeking the product of nonzero elements by row: Multiplying column elements of sparse Matrix

There I found the reduceat using indptr was quite fast.

Another tool when dealing with sparse matrices is multiplication

In [708]: (sparse.csr_matrix(np.array([1,0,0,0])[None,:])*A)
Out[708]: 
<1x6 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>

csr actually does sum with this kind of multiplication. And if my memory is correct, it actually performs A[0,:] this way

Sparse matrix slicing using list of int

Most efficient way of accessing non-zero values in row/column in scipy.sparse matrix

Tags:

python

scipy

sparse-matrix

musically_ut

People also ask

1 Answers

hpaulj

Recent Activity

Donate For Us

Most efficient way of accessing non-zero values in row/column in scipy.sparse matrix

Tags:

python

scipy

sparse-matrix

musically_ut

People also ask

1 Answers

hpaulj

Related questions

Recent Activity

Donate For Us