Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy.sparse.hstack(([1], [2])) -> "ValueError: blocks must be 2-D". Why?

scipy.sparse.hstack((1, [2])) and scipy.sparse.hstack((1, [2])) work well, but not scipy.sparse.hstack(([1], [2])). Why is this the case?

Here is a trace of what's happening on my system:


C:\Anaconda>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.
1500 64 bit (AMD64)] on win32
>>> import scipy.sparse
>>> scipy.sparse.hstack((1, [2]))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
        with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack((1, 2))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
        with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack(([1], [2]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 456, in h
stack
    return bmat([blocks], format=format, dtype=dtype)
  File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 539, in b
mat
    raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D
>>> scipy.version.full_version
'0.16.0'
>>>
like image 518
Franck Dernoncourt Avatar asked Aug 09 '15 03:08

Franck Dernoncourt


2 Answers

In the first case of scipy.sparse.hstack((1, [2])), the number 1 is interpreted as a scalar value and the number 2 is interpreted as a dense matrix, and so when you combine these two things together, the data types are coerced so that they are both scalars and you can combine this with scipy.sparse.hstack normally.

Here's some more tests to show that this is true with multiple values:

In [31]: scipy.sparse.hstack((1,2,[3],[4]))
Out[31]: 
<1x4 sparse matrix of type '<type 'numpy.int64'>'
    with 4 stored elements in COOrdinate format>

In [32]: scipy.sparse.hstack((1,2,[3],[4],5,6))
Out[32]: 
<1x6 sparse matrix of type '<type 'numpy.int64'>'
    with 6 stored elements in COOrdinate format>

In [33]: scipy.sparse.hstack((1,[2],[3],[4],5,[6],7))
Out[33]: 
<1x7 sparse matrix of type '<type 'numpy.int64'>'

As you can see, if you have at least one scalar present in hstack, this seems to work.

However, when you try to do the second case of scipy.sparse.hstack(([1],[2])), they aren't both scalars anymore and these are both dense matrices, and you can't use scipy.sparse.hstack with purely dense matrices.

To reproduce:

In [34]: scipy.sparse.hstack(([1],[2]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-cd79952b2e14> in <module>()
----> 1 scipy.sparse.hstack(([1],[2]))

/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.pyc in hstack(blocks, format, dtype)
    451 
    452     """
--> 453     return bmat([blocks], format=format, dtype=dtype)
    454 
    455 

/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.pyc in bmat(blocks, format, dtype)
    531 
    532     if blocks.ndim != 2:
--> 533         raise ValueError('blocks must be 2-D')
    534 
    535     M,N = blocks.shape

ValueError: blocks must be 2-D

See this post for more insight: Scipy error with sparse hstack

Therefore, if you want to use this successfully with two matrices, you must make them sparse first, then combine them:

In [36]: A = scipy.sparse.coo_matrix([1])

In [37]: B = scipy.sparse.coo_matrix([2])

In [38]: C = scipy.sparse.hstack([A, B])

In [39]: C
Out[39]: 
<1x2 sparse matrix of type '<type 'numpy.int64'>'
    with 2 stored elements in COOrdinate format>

Interestingly enough, if you tried doing what you did with the dense version of hstack, or numpy.hstack, then it's perfectly acceptable:

In [48]: import numpy as np

In [49]: np.hstack((1, [2]))
Out[49]: array([1, 2])

.... things muck up for sparse matrix representations ¯\_(ツ)_/¯.

like image 144
rayryeng Avatar answered Oct 03 '22 06:10

rayryeng


The coding details are:

def hstack(blocks ...):
    return bmat([blocks], ...)

def bmat(blocks, ...):
    blocks = np.asarray(blocks, dtype='object')
    if blocks.ndim != 2:
        raise ValueError('blocks must be 2-D')
    (continue)

So trying your alternatives (remembering the extra []):

In [392]: np.asarray([(1,2)],dtype=object)
Out[392]: array([[1, 2]], dtype=object)

In [393]: np.asarray([(1,[2])],dtype=object)
Out[393]: array([[1, [2]]], dtype=object)

In [394]: np.asarray([([1],[2])],dtype=object)
Out[394]: 
array([[[1],
        [2]]], dtype=object)

In [395]: _.shape
Out[395]: (1, 2, 1)

This last case (your problem case) failed because the result was 3d.

With 2 sparse matrices (expected input):

In [402]: np.asarray([[a,a]], dtype=object) 
Out[402]: 
array([[ <1x1 sparse matrix of type '<class 'numpy.int32'>'
    with 1 stored elements in COOrdinate format>,
        <1x1 sparse matrix of type '<class 'numpy.int32'>'
    with 1 stored elements in COOrdinate format>]], dtype=object)

In [403]: _.shape
Out[403]: (1, 2)

hstack is taking advantage of the bmat format, by turning a list of matrices into a nested (2d) list of matrices. bmat is meant to be a way of combining a 2d array of sparse matrices into one larger one. Skipping the step of first making these sparse matrices may, or might not, work. The code and the documentation don't make any promises.

like image 21
hpaulj Avatar answered Oct 03 '22 06:10

hpaulj