Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pytables, which is more efficient: scipy.sparse or numpy dense matrix?

When using pytables, there's no support (as far as I can tell) for the scipy.sparse matrix formats, so to store a matrix I have to do some conversion, e.g.

def store_sparse_matrix(self):
    grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M')
    self.getFileHandle().createArray(grp1, 'data', M.tocsr().data)
    self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr)
    self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices)

def get_sparse_matrix(self):
    return sparse.csr_matrix((self.getGroup().M.data, self.getGroup().M.indices, self.getGroup().M.indptr))

The trouble is that the get_sparse function takes some time (reading from disk), and if I understand it correctly also requires the data to fit into memory.

The only other option seems to convert the matrix to dense format (numpy array) and then use pytables normally. However this seems to be rather inefficient, although I suppose perhaps pytables will deal with the compression itself?

like image 493
tdc Avatar asked Jan 17 '12 13:01

tdc


People also ask

What is Scipy sparse matrix?

Sparse matrices are memory efficient data structures that enable us store large matrices with very few non-zero elements aka sparse matrices. In addition to efficient storage, sparse matrix data structure also allows us to perform complex matrix computations.

What is PyTables package?

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. You can download PyTables and use it for free. You can access documentation, some examples of use and presentations here.

How do you multiply sparse matrices in Python?

We use the multiply() method provided in both csc_matrix and csr_matrix classes to multiply two sparse matrices. We can multiply two matrices of same format( both matrices are csc or csr format) and also of different formats ( one matrix is csc and other is csr format).


1 Answers

Borrowing from Storing numpy sparse matrix in HDF5 (PyTables), you can marshal a scipy.sparse array into a pytables format using its data, indicies, and indptr attributes, which are three regular numpy.ndarray objects.

like image 173
IanSR Avatar answered Oct 05 '22 01:10

IanSR