Reverse sort and argsort in python

Tags:

I'm trying to write a function in Python (still a noob!) which returns indices and scores of documents ordered by the inner products of their tfidf scores. The procedure is:

Compute vector of inner products between doc idx and all other documents
Sort in descending order
Return the "scores" and indices from the second one to the end (i.e. not itself)

The code I have at the moment is:

Click to copy

import h5py
import numpy as np

def get_related(tfidf, idx) :
    ''' return the top documents '''

    # calculate inner product   
    v = np.inner(tfidf, tfidf[idx].transpose())

    # sort
    vs = np.sort(v.toarray(), axis=0)[::-1]
    scores = vs[1:,]

    # sort indices
    vi = np.argsort(v.toarray(), axis=0)[::-1]
    idxs = vi[1:,] 

    return (scores, idxs)

where tfidf is a sparse matrix of type '<type 'numpy.float64'>'.

This seems inefficient, as the sort is performed twice (sort() then argsort()), and the results have to then be reversed.

Can this be done more efficiently?
Can this be done without converting the sparse matrix using toarray()?

620

asked Dec 09 '11 12:12

tdc

1 Answers

I don't think there's any real need to skip the toarray. The v array will be only n_docs long, which is dwarfed by the size of the n_docs × n_terms tf-idf matrix in practical situations. Also, it will be quite dense since any term shared by two documents will give them a non-zero similarity. Sparse matrix representations only pay off when the matrix you're storing is very sparse (I've seen >80% figures for Matlab and assume that Scipy will be similar, though I don't have an exact figure).

The double sort can be skipped by doing

Click to copy

v = v.toarray()
vi = np.argsort(v, axis=0)[::-1]
vs = v[vi]

Btw., your use of np.inner on sparse matrices is not going to work with the latest versions of NumPy; the safe way of taking an inner product of two sparse matrices is

Click to copy

v = (tfidf * tfidf[idx, :]).transpose()

140

answered Sep 22 '22 00:09

Fred Foo

Related questions
                            
                                How would I achieve this in opencv with an affine transform?
                            
                                pandas Series getting 'Data must be 1-dimensional' error
                            
                                python type hint, return same type as input
                            
                                Broadcasted NumPy arithmetic - why is one method so much more performant?
                            
                                What are the valid values for --platform, --abi, and --implementation for pip download?
                            
                                Numpy import fails on multiarray extension library when called from embedded Python within a C++ application
                            
                                Django m2m_changed signal is never called
                            
                                Do a dry-run of an Alembic upgrade
                            
                                Opencv: Crop out text areas from license
                            
                                How can I minimize/maximize windows in macOS with the Cocoa API from a Python script?
                            
                                Why is np.dot imprecise? (n-dim arrays)
                            
                                PyTorch: RuntimeError: Input, output and indices must be on the current device
                            
                                What's the easiest non-memory intensive way to output XML from Python?
                            
                                Making an android Python service to run in suspend state
                            
                                elegant way to test python ASTs for equality (not reference or object identity)
                            
                                Ways to avoid MySQLdb's "Commands out of sync; you can't run this command now" (2014) exception
                            
                                Enabling Django Admin Filters on Many-to-Many Fields
                            
                                Python: what does "import" prefer - modules or packages?
                            
                                How to call a python class function from another file
                            
                                Rotate numpy 2D array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reverse sort and argsort in python

Tags:

python

numpy

scipy

information-retrieval

sparse-matrix

tdc

People also ask

1 Answers

Fred Foo

Recent Activity

Donate For Us