What would be the most efficient way to concatenate sparse matrices in Python using SciPy/Numpy?
Here I used the following:
>>> np.hstack((X, X2)) array([ <49998x70000 sparse matrix of type '<class 'numpy.float64'>' with 1135520 stored elements in Compressed Sparse Row format>, <49998x70000 sparse matrix of type '<class 'numpy.int64'>' with 1135520 stored elements in Compressed Sparse Row format>], dtype=object)
I would like to use both predictors in a regression, but the current format is obviously not what I'm looking for. Would it be possible to get the following:
<49998x1400000 sparse matrix of type '<class 'numpy.float64'>' with 2271040 stored elements in Compressed Sparse Row format>
It is too large to be converted to a deep format.
Similarly, you can use scipy. sparse. vstack to concatenate sparse matrices with the same number of columns (vertical concatenation).
Sparse Matrices in PythonA dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.
Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.
SciPy has a module, scipy. sparse that provides functions to deal with sparse data. There are primarily two types of sparse matrices that we use: CSC - Compressed Sparse Column.
You can use the scipy.sparse.hstack
to concatenate sparse matrices with the same number of rows (horizontal concatenation):
from scipy.sparse import hstack hstack((X, X2))
Similarly, you can use scipy.sparse.vstack
to concatenate sparse matrices with the same number of columns (vertical concatenation).
Using numpy.hstack
or numpy.vstack
will create an array with two sparse matrix objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With