What is row slicing vs What is column slicing?

Question

Yes, I've read this and this answer, but I cannot still grasp my mind around it... it's a basic question.

In:

M[:, index]
M[index, :]

Which one is row slicing, and which one is column slicing?

And to my problem, if I want to do advanced indexing for the columns like:

M[:, indexes]  # indexes is an array like [0, 4, 9]

Which sparse matrix type is most efficient for doing M[:, indexes], CSR or CSC ?

ogrisel · Accepted Answer

Actually neither is row/column slicing: those are examples of row/column indexing instead.

M[index, :] is row indexing
M[:, index] is column indexing
M[start:stop, :] is row slicing
M[:, start:stop] is column slicing

CSC is more efficient at retrieving entire columns: the non-zero values of a specific column and the matching row indices are internally stored as contiguous arrays in memory.

The dual is true for CSR and the retrieval of an entire row.

hpaulj · Answer

While row selection is faster for csr than for col selection, the difference is not big:

In [288]: Mbig=sparse.rand(1000,1000,.1, 'csr')
In [289]: Mbig[:1000:50,:]
Out[289]: 
<20x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 2066 stored elements in Compressed Sparse Row format>
In [290]: timeit Mbig[:1000:50,:]
1000 loops, best of 3: 1.53 ms per loop
In [291]: timeit Mbig[:,:1000:50]
100 loops, best of 3: 2.04 ms per loop

In [292]: Mbig=sparse.rand(1000,1000,.1, 'csc')
In [293]: timeit Mbig[:1000:50,:]
100 loops, best of 3: 2.16 ms per loop
In [294]: timeit Mbig[:,:1000:50]
1000 loops, best of 3: 1.65 ms per loop

It isn't worth it to switch format

In [295]: timeit Mbig.tocsr()[:1000:50,:]
...
100 loops, best of 3: 2.46 ms per loop

Contrast this with the same slice for the dense version:

In [297]: A=Mbig.A
In [298]: timeit A[:,:1000:50]
...
1000000 loops, best of 3: 557 ns per loop
In [301]: timeit A[:,:1000:50].copy()
...
10000 loops, best of 3: 52.5 µs per loop

To complicate the comparison, indexing with an array (numpy advanced) is actually faster than with the 'slice':

In [308]: idx=np.r_[0:1000:50]    # expand slice into array
In [309]: timeit Mbig[idx,:]
1000 loops, best of 3: 1.49 ms per loop
In [310]: timeit Mbig[:,idx]
1000 loops, best of 3: 513 µs per loop

Here the column indexing of a csc has a greater speed improvement.

And single row or column, csr and csc have getrow/col methods:

In [314]: timeit Mbig.getrow(500)
1000 loops, best of 3: 434 µs per loop
In [315]: timeit Mbig.getcol(500)        # 1 column from csc is fastest
10000 loops, best of 3: 78.7 µs per loop
In [316]: timeit Mbig[500,:]
1000 loops, best of 3: 505 µs per loop
In [317]: timeit Mbig[:,500]
1000 loops, best of 3: 264 µs per loop

In https://stackoverflow.com/a/39500986/901925 I recreated the extractor code that sparse uses to fetch rows or columns. It constructs a new sparse 'vector' of 1s and 0s, and uses matrix multiplication to 'select' rows or columns.

What is row slicing vs What is column slicing?

Tags:

python

machine-learning

scipy

scikit-learn

sparse-matrix

juanmirocks

2 Answers

ogrisel

hpaulj

Recent Activity

Donate For Us

What is row slicing vs What is column slicing?

Tags:

python

machine-learning

scipy

scikit-learn

sparse-matrix

juanmirocks

2 Answers

ogrisel

hpaulj

Related questions

Recent Activity

Donate For Us