scipy.sparse.hstack((1, [2]))
and scipy.sparse.hstack((1, [2]))
work well, but not scipy.sparse.hstack(([1], [2]))
. Why is this the case?
Here is a trace of what's happening on my system:
C:\Anaconda>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.
1500 64 bit (AMD64)] on win32
>>> import scipy.sparse
>>> scipy.sparse.hstack((1, [2]))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack((1, 2))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack(([1], [2]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 456, in h
stack
return bmat([blocks], format=format, dtype=dtype)
File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 539, in b
mat
raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D
>>> scipy.version.full_version
'0.16.0'
>>>
In the first case of scipy.sparse.hstack((1, [2]))
, the number 1 is interpreted as a scalar value and the number 2 is interpreted as a dense matrix, and so when you combine these two things together, the data types are coerced so that they are both scalars and you can combine this with scipy.sparse.hstack
normally.
Here's some more tests to show that this is true with multiple values:
In [31]: scipy.sparse.hstack((1,2,[3],[4]))
Out[31]:
<1x4 sparse matrix of type '<type 'numpy.int64'>'
with 4 stored elements in COOrdinate format>
In [32]: scipy.sparse.hstack((1,2,[3],[4],5,6))
Out[32]:
<1x6 sparse matrix of type '<type 'numpy.int64'>'
with 6 stored elements in COOrdinate format>
In [33]: scipy.sparse.hstack((1,[2],[3],[4],5,[6],7))
Out[33]:
<1x7 sparse matrix of type '<type 'numpy.int64'>'
As you can see, if you have at least one scalar present in hstack
, this seems to work.
However, when you try to do the second case of scipy.sparse.hstack(([1],[2]))
, they aren't both scalars anymore and these are both dense matrices, and you can't use scipy.sparse.hstack
with purely dense matrices.
To reproduce:
In [34]: scipy.sparse.hstack(([1],[2]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-45-cd79952b2e14> in <module>()
----> 1 scipy.sparse.hstack(([1],[2]))
/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.pyc in hstack(blocks, format, dtype)
451
452 """
--> 453 return bmat([blocks], format=format, dtype=dtype)
454
455
/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.pyc in bmat(blocks, format, dtype)
531
532 if blocks.ndim != 2:
--> 533 raise ValueError('blocks must be 2-D')
534
535 M,N = blocks.shape
ValueError: blocks must be 2-D
See this post for more insight: Scipy error with sparse hstack
Therefore, if you want to use this successfully with two matrices, you must make them sparse first, then combine them:
In [36]: A = scipy.sparse.coo_matrix([1])
In [37]: B = scipy.sparse.coo_matrix([2])
In [38]: C = scipy.sparse.hstack([A, B])
In [39]: C
Out[39]:
<1x2 sparse matrix of type '<type 'numpy.int64'>'
with 2 stored elements in COOrdinate format>
Interestingly enough, if you tried doing what you did with the dense version of hstack
, or numpy.hstack
, then it's perfectly acceptable:
In [48]: import numpy as np
In [49]: np.hstack((1, [2]))
Out[49]: array([1, 2])
.... things muck up for sparse matrix representations ¯\_(ツ)_/¯
.
The coding details are:
def hstack(blocks ...):
return bmat([blocks], ...)
def bmat(blocks, ...):
blocks = np.asarray(blocks, dtype='object')
if blocks.ndim != 2:
raise ValueError('blocks must be 2-D')
(continue)
So trying your alternatives (remembering the extra []
):
In [392]: np.asarray([(1,2)],dtype=object)
Out[392]: array([[1, 2]], dtype=object)
In [393]: np.asarray([(1,[2])],dtype=object)
Out[393]: array([[1, [2]]], dtype=object)
In [394]: np.asarray([([1],[2])],dtype=object)
Out[394]:
array([[[1],
[2]]], dtype=object)
In [395]: _.shape
Out[395]: (1, 2, 1)
This last case (your problem case) failed because the result was 3d.
With 2 sparse matrices (expected input):
In [402]: np.asarray([[a,a]], dtype=object)
Out[402]:
array([[ <1x1 sparse matrix of type '<class 'numpy.int32'>'
with 1 stored elements in COOrdinate format>,
<1x1 sparse matrix of type '<class 'numpy.int32'>'
with 1 stored elements in COOrdinate format>]], dtype=object)
In [403]: _.shape
Out[403]: (1, 2)
hstack
is taking advantage of the bmat
format, by turning a list of matrices into a nested (2d) list of matrices. bmat
is meant to be a way of combining a 2d array of sparse matrices into one larger one. Skipping the step of first making these sparse matrices may, or might not, work. The code and the documentation don't make any promises.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With