Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an efficient way of concatenating scipy.sparse matrices?

I'm working with some rather large sparse matrices (from 5000x5000 to 20000x20000) and need to find an efficient way to concatenate matrices in a flexible way in order to construct a stochastic matrix from separate parts.

Right now I'm using the following way to concatenate four matrices, but it's horribly inefficient. Is there any better way to do this that doesn't involve converting to a dense matrix?

rmat[0:m1.shape[0],0:m1.shape[1]] = m1 rmat[m1.shape[0]:rmat.shape[0],m1.shape[1]:rmat.shape[1]] = m2 rmat[0:m1.shape[0],m1.shape[1]:rmat.shape[1]] = bridge rmat[m1.shape[0]:rmat.shape[0],0:m1.shape[1]] = bridge.transpose() 
like image 793
jones Avatar asked Jul 27 '11 13:07

jones


People also ask

How do you combine two sparse matrices in Python?

Similarly, you can use scipy. sparse. vstack to concatenate sparse matrices with the same number of columns (vertical concatenation).

How do SciPy sparse matrices multiply?

We use the multiply() method provided in both csc_matrix and csr_matrix classes to multiply two sparse matrices. We can multiply two matrices of same format( both matrices are csc or csr format) and also of different formats ( one matrix is csc and other is csr format).

How do you reduce sparse matrix in python?

The dimensionality of the sparse matrix can be reduced by first representing the dense matrix as a Compressed sparse row representation in which the sparse matrix is represented using three one-dimensional arrays for the non-zero values, the extents of the rows, and the column indexes.

What does SciPy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.


2 Answers

The sparse library now has hstack and vstack for respectively concatenating matrices horizontally and vertically.

like image 126
Erik Avatar answered Sep 28 '22 16:09

Erik


Using hstack, vstack, or concatenate, is dramatically slower than concatenating the inner data objects themselves. The reason is that hstack/vstack converts the sparse matrix to coo format which can be very slow when the matrix is very large not and not in coo format. Here is the code for concatenating csc matrices, similar method can be used for csr matrices:

def concatenate_csc_matrices_by_columns(matrix1, matrix2):     new_data = np.concatenate((matrix1.data, matrix2.data))     new_indices = np.concatenate((matrix1.indices, matrix2.indices))     new_ind_ptr = matrix2.indptr + len(matrix1.data)     new_ind_ptr = new_ind_ptr[1:]     new_ind_ptr = np.concatenate((matrix1.indptr, new_ind_ptr))      return csc_matrix((new_data, new_indices, new_ind_ptr)) 
like image 42
Amos Avatar answered Sep 28 '22 15:09

Amos