I'm working with some rather large sparse matrices (from 5000x5000 to 20000x20000) and need to find an efficient way to concatenate matrices in a flexible way in order to construct a stochastic matrix from separate parts.
Right now I'm using the following way to concatenate four matrices, but it's horribly inefficient. Is there any better way to do this that doesn't involve converting to a dense matrix?
rmat[0:m1.shape[0],0:m1.shape[1]] = m1 rmat[m1.shape[0]:rmat.shape[0],m1.shape[1]:rmat.shape[1]] = m2 rmat[0:m1.shape[0],m1.shape[1]:rmat.shape[1]] = bridge rmat[m1.shape[0]:rmat.shape[0],0:m1.shape[1]] = bridge.transpose()
Similarly, you can use scipy. sparse. vstack to concatenate sparse matrices with the same number of columns (vertical concatenation).
We use the multiply() method provided in both csc_matrix and csr_matrix classes to multiply two sparse matrices. We can multiply two matrices of same format( both matrices are csc or csr format) and also of different formats ( one matrix is csc and other is csr format).
The dimensionality of the sparse matrix can be reduced by first representing the dense matrix as a Compressed sparse row representation in which the sparse matrix is represented using three one-dimensional arrays for the non-zero values, the extents of the rows, and the column indexes.
The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.
The sparse library now has hstack
and vstack
for respectively concatenating matrices horizontally and vertically.
Using hstack, vstack, or concatenate, is dramatically slower than concatenating the inner data objects themselves. The reason is that hstack/vstack converts the sparse matrix to coo format which can be very slow when the matrix is very large not and not in coo format. Here is the code for concatenating csc matrices, similar method can be used for csr matrices:
def concatenate_csc_matrices_by_columns(matrix1, matrix2): new_data = np.concatenate((matrix1.data, matrix2.data)) new_indices = np.concatenate((matrix1.indices, matrix2.indices)) new_ind_ptr = matrix2.indptr + len(matrix1.data) new_ind_ptr = new_ind_ptr[1:] new_ind_ptr = np.concatenate((matrix1.indptr, new_ind_ptr)) return csc_matrix((new_data, new_indices, new_ind_ptr))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With