Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to hstack several sparse matrices (feature matrices)?

I have 3 sparse matrices:

In [39]:

mat1


Out[39]:
(1, 878049)
<1x878049 sparse matrix of type '<type 'numpy.int64'>'
    with 878048 stored elements in Compressed Sparse Row format>

In [37]:

mat2


Out[37]:
(1, 878049)
<1x878049 sparse matrix of type '<type 'numpy.int64'>'
    with 744315 stored elements in Compressed Sparse Row format>

In [35]:

mat3



Out[35]:
(1, 878049)
<1x878049 sparse matrix of type '<type 'numpy.int64'>'
    with 788618 stored elements in Compressed Sparse Row format>

From the documentation, I read that it is possible to hstack, vstack, and concatenate them such type of matrices. So I tried to hstack them:

import numpy as np

matrix1 = np.hstack([[address_feature, dayweek_feature]]).T
matrix2 = np.vstack([[matrix1, pddis_feature]]).T


X = matrix2

However, the dimensions do not match:

In [41]:

X_combined_features.shape

Out[41]:

(2, 1)

Note that I am stacking such matrices since I would like to use them with a scikit-learn classification algorithm. Therefore, How should I hstack a number of different sparse matrices?.

like image 543
john doe Avatar asked Jun 09 '16 04:06

john doe


People also ask

What is a sparse matrix?

A matrix is a two-dimensional data object made of m rows and n columns, therefore having total m x n values. If most of the elements of the matrix have 0 value, then it is called a sparse matrix. Why to use Sparse Matrix instead of simple matrix ? Attention reader! Don’t stop learning now.

What is the difficulty level of operations on sparse matrices?

Operations on Sparse Matrices Difficulty Level : Medium Last Updated : 06 Jan, 2020 Given two sparse matrices (Sparse Matrix and its representations | Set 1 (Using Arrays and Linked Lists)), perform operations such as add, multiply or transpose of the matrices in their sparse form itself.

How to represent a sparse matrix in a linked list?

Linked list representation; Method 1: Using Arrays . 2D array is used to represent a sparse matrix in which there are three rows named as . Row: Index of row, where non-zero element is located; Column: Index of column, where non-zero element is located; V ...

What is the default sparse matrix format in vstack?

sparse format of the result (e.g., “csr”) by default an appropriate sparse matrix format is returned. This choice is subject to change. dtypedtype, optional The data-type of the output matrix. If not given, the dtype is determined from that of blocks. See also vstack stack sparse matrices vertically (row wise) Examples


1 Answers

Use the sparse versions of vstack. As general rule you need to use sparse functions and methods, not the numpy ones with similar name. sparse matrices are not subclasses of numpy ndarray.

But, your 3 three matrices do not look sparse. They are 1x878049. One has 878048 nonzero elements - that means just one 0 element.

So you could just as well turned them into dense arrays (with .toarray() or .A) and use np.hstack or np.vstack.

np.hstack([address_feature.A, dayweek_feature.A])

And don't use the double brackets. All concatenate functions take a simple list or tuple of the arrays. And that list can have more than 2 arrays

In [296]: A=sparse.csr_matrix([0,1,2,0,0,1])

In [297]: B=sparse.csr_matrix([0,0,0,1,0,1])

In [298]: C=sparse.csr_matrix([1,0,0,0,1,0])

In [299]: sparse.vstack([A,B,C])
Out[299]: 
<3x6 sparse matrix of type '<class 'numpy.int32'>'
    with 7 stored elements in Compressed Sparse Row format>

In [300]: sparse.vstack([A,B,C]).A
Out[300]: 
array([[0, 1, 2, 0, 0, 1],
       [0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0]], dtype=int32)

In [301]: sparse.hstack([A,B,C]).A
Out[301]: array([[0, 1, 2, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0]], dtype=int32)

In [302]: np.vstack([A.A,B.A,C.A])
Out[302]: 
array([[0, 1, 2, 0, 0, 1],
       [0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0]], dtype=int32)
like image 155
hpaulj Avatar answered Sep 21 '22 12:09

hpaulj