Argmax of each row or column in scipy sparse matrix

Tags:

scipy.sparse.coo_matrix.max returns the maximum value of each row or column, given an axis. I would like to know not the value, but the index of the maximum value of each row or column. I haven't found a way to make this in an efficient manner yet, so I'll gladly accept any help.

574

asked Jun 09 '15 20:06

Jimmy C

1 Answers

As others mention there is now built-in argmax() for scipy.sparse matrices. However, I found it to be quite slow for large matrices so I had a look at the source code. The logic is very smart, but it contains a python loop slowing things down. Taking the source code and reducing it to argmax per row for example (while sacrificing all generality, shape checking etc. for simplicity) and decorating it with numba can give some nice speed improvements.

Here's the function:

import numpy as np
from numba import jit


def argmax_row_numba(X):
    return _argmax_row_numba(X.shape[0], X.indptr, X.data, X.indices)

@jit(nopython=True)
def _argmax_row_numba(shape, indptr, data, indices):
    # prep an array to hold the indices
    ret = np.zeros(shape)
    # figure out which lines actually contain data
    nz_lines, = np.diff(indptr).nonzero()
    # loop through the lines
    for i in nz_lines:
        p, q = indptr[i: i + 2]
        line_data = data[p: q]
        line_indices = indices[p: q]
        am = np.argmax(line_data)
        ret[i] = line_indices[am]

    return ret

Generating a matrix for testing:


from scipy.sparse import random
size = 10000
m = random(m=size, n=size, density=0.0001, format="csr")
n_vals = m.data.shape[0]
m.data = np.random.random(size=n_vals).astype("float")


# the original scipy implementation reformatted to return a np.array
maxima1 = np.squeeze(np.array(m.argmax(axis=1)))
# calling the numba version
maxima2 = argmax_row_numba(m)

# Check that the results are the same
print(np.allclose(maxima1, maxima2))
# True

Timing results:

%timeit m.argmax(axis=1)
# 30.1 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit argmax_row_numba(m)
# 211 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

184

answered Sep 20 '22 05:09

L_W

Related questions
                            
                                How to obtain information gain from a scikit-learn DecisionTreeClassifier?
                            
                                How to fill a polygon with a custom hatch in matplotlib?
                            
                                How to avoid StopIteration Error in python
                            
                                Python cant get full path name of file
                            
                                error when compiling cx_Freeze on Ubuntu
                            
                                Running Multiple Scrapy Spiders (the easy way) Python
                            
                                Is there a Python reusable component that is like the Blender node editor? [closed]
                            
                                Django admin - Mixing multiple model inlines in single admin interface
                            
                                How to modify matplotlib legend after it has been created?
                            
                                Python's implementation of Mutual Information
                            
                                How do I download a file from S3 using boto only if the remote file is newer than a local copy?
                            
                                How to embed Bokeh server in Django application
                            
                                Reading streaming http response with Python "requests" library
                            
                                CGI script downloads instead of running
                            
                                Python HTTP server send JSON response
                            
                                what's the use of transformer_weights in scikit-learn pipeline?
                            
                                python pexpect clearing or flushing the line
                            
                                Run specific Django tests (with django-nose?)
                            
                                Python multi dimensional sparse array
                            
                                Using virtualenv with Sublime Text 3 and SublimeREPL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Argmax of each row or column in scipy sparse matrix

Tags:

python

scipy

sparse-matrix

Jimmy C

People also ask

1 Answers

L_W

Recent Activity

Donate For Us