I have 3 sparse matrices: <pre class="prettyprint"><code>In [39]: mat1 Out[39]: (1, 878049) <1x878049 sparse matrix of type '<type 'numpy.int64'>' with 878048 stored elements in Compressed Sparse Row format> In [37]: mat2 Out[37]: (1, 878049) <1x878049 sparse matrix of type '<type 'numpy.int64'>' with 744315 stored elements in Compressed Sparse Row format> In [35]: mat3 Out[35]: (1, 878049) <1x878049 sparse matrix of type '<type 'numpy.int64'>' with 788618 stored elements in Compressed Sparse Row format> </code></pre> From the documentation, I read that it is possible to <code>hstack</code>, <code>vstack</code>, and <code>concatenate</code> them such type of matrices. So I tried to <code>hstack</code> them: <pre class="prettyprint"><code>import numpy as np matrix1 = np.hstack([[address_feature, dayweek_feature]]).T matrix2 = np.vstack([[matrix1, pddis_feature]]).T X = matrix2 </code></pre> However, the dimensions do not match: <pre class="prettyprint"><code>In [41]: X_combined_features.shape Out[41]: (2, 1) </code></pre> Note that I am stacking such matrices since I would like to use them with a scikit-learn classification algorithm. Therefore, How should I <code>hstack</code> a number of different sparse matrices?.

Use the <code>sparse</code> versions of <code>vstack</code>. As general rule you need to use sparse functions and methods, not the <code>numpy</code> ones with similar name. <code>sparse</code> matrices are not subclasses of <code>numpy</code> <code>ndarray</code>. But, your 3 three matrices do not look sparse. They are 1x878049. One has 878048 nonzero elements - that means just one 0 element. So you could just as well turned them into dense arrays (with <code>.toarray()</code> or <code>.A</code>) and use <code>np.hstack</code> or <code>np.vstack</code>. <pre class="prettyprint"><code>np.hstack([address_feature.A, dayweek_feature.A]) </code></pre> And don't use the double brackets. All concatenate functions take a simple list or tuple of the arrays. And that list can have more than 2 arrays <pre class="prettyprint"><code>In [296]: A=sparse.csr_matrix([0,1,2,0,0,1]) In [297]: B=sparse.csr_matrix([0,0,0,1,0,1]) In [298]: C=sparse.csr_matrix([1,0,0,0,1,0]) In [299]: sparse.vstack([A,B,C]) Out[299]: <3x6 sparse matrix of type '<class 'numpy.int32'>' with 7 stored elements in Compressed Sparse Row format> In [300]: sparse.vstack([A,B,C]).A Out[300]: array([[0, 1, 2, 0, 0, 1], [0, 0, 0, 1, 0, 1], [1, 0, 0, 0, 1, 0]], dtype=int32) In [301]: sparse.hstack([A,B,C]).A Out[301]: array([[0, 1, 2, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0]], dtype=int32) In [302]: np.vstack([A.A,B.A,C.A]) Out[302]: array([[0, 1, 2, 0, 0, 1], [0, 0, 0, 1, 0, 1], [1, 0, 0, 0, 1, 0]], dtype=int32) </code></pre>

How to hstack several sparse matrices (feature matrices)?

Tags:

python

machine-learning

numpy

scipy

scikit-learn

I have 3 sparse matrices:

In [39]:

mat1


Out[39]:
(1, 878049)
<1x878049 sparse matrix of type '<type 'numpy.int64'>'
    with 878048 stored elements in Compressed Sparse Row format>

In [37]:

mat2


Out[37]:
(1, 878049)
<1x878049 sparse matrix of type '<type 'numpy.int64'>'
    with 744315 stored elements in Compressed Sparse Row format>

In [35]:

mat3



Out[35]:
(1, 878049)
<1x878049 sparse matrix of type '<type 'numpy.int64'>'
    with 788618 stored elements in Compressed Sparse Row format>

From the documentation, I read that it is possible to hstack, vstack, and concatenate them such type of matrices. So I tried to hstack them:

import numpy as np

matrix1 = np.hstack([[address_feature, dayweek_feature]]).T
matrix2 = np.vstack([[matrix1, pddis_feature]]).T


X = matrix2

However, the dimensions do not match:

In [41]:

X_combined_features.shape

Out[41]:

(2, 1)

Note that I am stacking such matrices since I would like to use them with a scikit-learn classification algorithm. Therefore, How should I hstack a number of different sparse matrices?.

543

asked Jun 09 '16 04:06

john doe

1 Answers

Use the sparse versions of vstack. As general rule you need to use sparse functions and methods, not the numpy ones with similar name. sparse matrices are not subclasses of numpy ndarray.

But, your 3 three matrices do not look sparse. They are 1x878049. One has 878048 nonzero elements - that means just one 0 element.

So you could just as well turned them into dense arrays (with .toarray() or .A) and use np.hstack or np.vstack.

np.hstack([address_feature.A, dayweek_feature.A])

And don't use the double brackets. All concatenate functions take a simple list or tuple of the arrays. And that list can have more than 2 arrays

In [296]: A=sparse.csr_matrix([0,1,2,0,0,1])

In [297]: B=sparse.csr_matrix([0,0,0,1,0,1])

In [298]: C=sparse.csr_matrix([1,0,0,0,1,0])

In [299]: sparse.vstack([A,B,C])
Out[299]: 
<3x6 sparse matrix of type '<class 'numpy.int32'>'
    with 7 stored elements in Compressed Sparse Row format>

In [300]: sparse.vstack([A,B,C]).A
Out[300]: 
array([[0, 1, 2, 0, 0, 1],
       [0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0]], dtype=int32)

In [301]: sparse.hstack([A,B,C]).A
Out[301]: array([[0, 1, 2, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0]], dtype=int32)

In [302]: np.vstack([A.A,B.A,C.A])
Out[302]: 
array([[0, 1, 2, 0, 0, 1],
       [0, 0, 0, 1, 0, 1],
       [1, 0, 0, 0, 1, 0]], dtype=int32)

155

answered Sep 21 '22 12:09

hpaulj

Related questions
                            
                                Render_to_string and response.content.decode() not matching
                            
                                Getting a slice of a numpy ndarray (for arbitary dimensions)
                            
                                Is it possible to select pandas dataframe with row indices and column names?
                            
                                How to use sum and order by in SQLAlchemy query
                            
                                Unable to import grequests for AWS Lambda
                            
                                How to recognize windows 10 using Python? [closed]
                            
                                Is this Python "static variable" hack ok to use? [closed]
                            
                                Go c-shared library callback into other languages
                            
                                Convert fraction to string with repeating decimal places in brackets
                            
                                Django GenericRelation still does not enable reverse querying from GenericForeignKey
                            
                                How to read excel cell and retain or detect its format in Python
                            
                                Resetting paused scrape, Scrapy
                            
                                Cython build can't find C++11 STL files - but only when called from setup.py
                            
                                OpenCV affine transformation won't perform
                            
                                How do I get TensorFlow's 'import_graph_def' to return Tensors
                            
                                element-wise operations of matrix in python
                            
                                How to use pelican to generate a hierarchical website, not a blog
                            
                                error :document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
                            
                                Finding all repeated substrings in a string and how often they appear
                            
                                are there any limitations on the number of locks a python program can create?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With