Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save / load scipy sparse csr_matrix in portable data format

How do you save/load a scipy sparse csr_matrix in a portable format? The scipy sparse matrix is created on Python 3 (Windows 64-bit) to run on Python 2 (Linux 64-bit). Initially, I used pickle (with protocol=2 and fix_imports=True) but this didn't work going from Python 3.2.2 (Windows 64-bit) to Python 2.7.2 (Windows 32-bit) and got the error:

TypeError: ('data type not understood', <built-in function _reconstruct>, (<type 'numpy.ndarray'>, (0,), '[98]')). 

Next, tried numpy.save and numpy.load as well as scipy.io.mmwrite() and scipy.io.mmread() and none of these methods worked either.

like image 244
Henry Thornton Avatar asked Jan 21 '12 18:01

Henry Thornton


People also ask

How do you save a sparse matrix in python?

Save a sparse matrix to a file using . npz format. Either the file name (string) or an open file (file-like object) where the data will be saved.

What does SciPy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

How do I save a sparse matrix in R?

One of the ways to save the sparse matrix is to save them as Mtx file, that stores matrix in MatrixMarket format. We can use writeMM function to save the sparse matrix object into a file.

What is compressed sparse row format?

The compressed sparse row (CSR) or compressed row storage (CRS) or Yale format represents a matrix M by three (one-dimensional) arrays, that respectively contain nonzero values, the extents of rows, and column indices. It is similar to COO, but compresses the row indices, hence the name.


1 Answers

edit: scipy 0.19 now has scipy.sparse.save_npz and scipy.sparse.load_npz.

from scipy import sparse  sparse.save_npz("yourmatrix.npz", your_matrix) your_matrix_back = sparse.load_npz("yourmatrix.npz") 

For both functions, the file argument may also be a file-like object (i.e. the result of open) instead of a filename.


Got an answer from the Scipy user group:

A csr_matrix has 3 data attributes that matter: .data, .indices, and .indptr. All are simple ndarrays, so numpy.save will work on them. Save the three arrays with numpy.save or numpy.savez, load them back with numpy.load, and then recreate the sparse matrix object with:

new_csr = csr_matrix((data, indices, indptr), shape=(M, N)) 

So for example:

def save_sparse_csr(filename, array):     np.savez(filename, data=array.data, indices=array.indices,              indptr=array.indptr, shape=array.shape)  def load_sparse_csr(filename):     loader = np.load(filename)     return csr_matrix((loader['data'], loader['indices'], loader['indptr']),                       shape=loader['shape']) 
like image 183
Henry Thornton Avatar answered Sep 17 '22 14:09

Henry Thornton