Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scipy: Do sparse matrices support advanced indexing?

No problem:

>>> t = np.array([[1,1,1,1,1],[2,2,2,2,2],[3,3,3,3,3],[4,4,4,4,4],[5,5,5,5,5]])
>>> x = np.arange(5).reshape((-1,1)); y = np.arange(5)
>>> print (t[[x]],t[[y]])

Big problem:

>>> s = scipy.sparse.csr_matrix(t)
>>> print (s[[x]].toarray(),s[[y]].toarray())
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
:               :
:               :
ValueError: data, indices, and indptr should be rank 1

s.toarray()[[x]] works great, but defeats the whole purpose of me using sparse matrices as my arrays are too big. I've checked the Attributes and Methods associated with some of the sparse matrices for anything referencing Advanced Indexing, but no dice. Any ideas?

like image 705
Noob Saibot Avatar asked Jan 23 '13 23:01

Noob Saibot


People also ask

What is the advantage of using sparse matrix?

Using sparse matrices to store data that contains a large number of zero-valued elements can both save a significant amount of memory and speed up the processing of that data. sparse is an attribute that you can assign to any two-dimensional MATLAB® matrix that is composed of double or logical elements.

What is the issue with sparse matrices?

The problem with representing these sparse matrices as dense matrices is that memory is required and must be allocated for each 32-bit or even 64-bit zero value in the matrix. This is clearly a waste of memory resources as those zero values do not contain any information.

What is the advantage of using a sparse array over using a regular array?

Storage: When there is the maximum number of zero elements and the minimum number of non-zero elements then we use a sparse array over a simple array as it requires less memory to store the elements. In the sparse array, we only store the non-zero elements.

What is sparse matrix in Scipy?

Matrices that mostly contain zeroes are said to be sparse. Sparse matrices are commonly used in applied machine learning (such as in data containing data-encodings that map categories to count) and even in whole subfields of machine learning such as natural language processing (NLP).


1 Answers

sparse matrices have a very limited indexing support, and what is available depends on the format of the matrix.

For example:

>>> a = scipy.sparse.rand(100,100,format='coo')
>>> a[2:5, 6:8]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'coo_matrix' object has no attribute '__getitem__'

but

>>> a = scipy.sparse.rand(100,100,format='csc')
>>> a[2:5, 6:8]
<3x2 sparse matrix of type '<type 'numpy.float64'>'
    with 0 stored elements in Compressed Sparse Column format>

although

>>> a[2:5:2, 6:8:3]
Traceback (most recent call last):
...
ValueError: slicing with step != 1 not supported

There is also

>>> a = scipy.sparse.rand(100,100,format='dok')
>>> a[2:5:2, 6:8:3]
Traceback (most recent call last):
...
NotImplementedError: fancy indexing supported over one axis only
>>> a[2:5:2,1]
<3x1 sparse matrix of type '<type 'numpy.float64'>'
    with 0 stored elements in Dictionary Of Keys format>

And even

>>> a = scipy.sparse.rand(100,100,format='lil')
>>> a[2:5:2,1]
<2x1 sparse matrix of type '<type 'numpy.int32'>'
    with 0 stored elements in LInked List format>
C:\Python27\lib\site-packages\scipy\sparse\lil.py:230: SparseEfficiencyWarning: Indexing into a lil_matrix with multiple indices is slow. Pre-converting to CSC or CSR beforehand is more efficient.
  SparseEfficiencyWarning)
>>> a[2:5:2, 6:8:3]
<2x1 sparse matrix of type '<type 'numpy.int32'>'
    with 0 stored elements in LInked List format>
like image 156
Jaime Avatar answered Sep 28 '22 16:09

Jaime