Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a scipy csr_matrix back into lists of row, col and data?

I have a scipy csr_matrix that was created this way as specified in the documentation:

import numpy as np
from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
mtr = csr_matrix((data, (row, col)))
mtr.toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

How do I efficiently convert such a matrix mtr back into the initial three lists row, col and data?

like image 408
Sergey Zakharov Avatar asked Nov 02 '17 23:11

Sergey Zakharov


People also ask

What does Scipy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

How do you convert a sparse matrix into a DataFrame?

from_spmatrix() function. The sparse-from_spmatrix() function is used to create a new DataFrame from a scipy sparse matrix. Must be convertible to csc format. Row and column labels to use for the resulting DataFrame.


2 Answers

As you noted in a comment, you can get the data by accessing the data attribute. To get the rows and columns, you could convert the array to COO format, and access the data, row and col attributes:

Here's your array mtr:

In [11]: mtr
Out[11]: 
<3x3 sparse matrix of type '<class 'numpy.int64'>'
    with 6 stored elements in Compressed Sparse Row format>

In [12]: mtr.A
Out[12]: 
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]], dtype=int64)

Convert to COO format, and access the data, row and col attributes.

In [13]: c = mtr.tocoo()

In [14]: c.data
Out[14]: array([1, 2, 3, 4, 5, 6], dtype=int64)

In [15]: c.row
Out[15]: array([0, 0, 1, 2, 2, 2], dtype=int32)

In [16]: c.col
Out[16]: array([0, 2, 2, 0, 1, 2], dtype=int32)
like image 155
Warren Weckesser Avatar answered Sep 22 '22 07:09

Warren Weckesser


Just call my_csr_matrix.nonzero() followed by indexing.

Code:

import numpy as np
from scipy.sparse import csr_matrix
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
mtr = csr_matrix((data, (row, col)))

print(mtr.todense())

rows, cols = mtr.nonzero()
data = mtr[rows, cols]

print(rows, cols, data)

Output:

[[1 0 2]
 [0 0 3]
 [4 5 6]]
[0 0 1 2 2 2] [0 2 2 0 1 2] [[1 2 3 4 5 6]]
like image 28
sascha Avatar answered Sep 19 '22 07:09

sascha