Efficient nearest neighbour search for sparse matrices

Tags:

I have a large corpus of data (text) that I have converted to a sparse term-document matrix (I am using scipy.sparse.csr.csr_matrix to store sparse matrix). I want to find, for every document, top n nearest neighbour matches. I was hoping that NearestNeighbor routine in Python scikit-learn library (sklearn.neighbors.NearestNeighbor to be precise) would solve my problem, but efficient algorithms that use space partitioning data structures such as KD trees or Ball trees do not work with sparse matrices. Only brute-force algorithm works with sparse matrices (which is infeasible in my case as I am dealing with large corpus).

Is there any efficient implementation of nearest neighbour search for sparse matrices (in Python or in any other language)?

Thanks.

908

asked Aug 10 '13 17:08

abhinavkulkarni

2 Answers

Late answer: Have a look at Locality-Sensitive-Hashing

Support in scikit-learn has been proposed here and here.

175

answered Nov 13 '22 08:11

Unapiedra

You can try to transform your high-dimensional sparse data to low-dimensional dense data using TruncatedSVD then do a ball-tree.

answered Nov 13 '22 09:11

Mathieu

Related questions
                            
                                SQLAlchemy: limit in the same string as where
                            
                                Count the Number of Syllables in a Word
                            
                                django - static files in base template
                            
                                Python locals() for containing scope
                            
                                Check if two "simple" 'if statements' in C are equivalent
                            
                                How to check if SSH connection was established with AWS instance
                            
                                Django: ListView with post() method?
                            
                                Tkinter image not showing or giving an error
                            
                                How to enable python3 in vim?
                            
                                sqlalchemy Integer column size
                            
                                Iterating two arrays, without nditer, in numpy?
                            
                                Find by Text and Replace in HTML BeautifulSoup
                            
                                Python console and text output from Ping including \n\r [duplicate]
                            
                                Empty space in between rows after using writer in python
                            
                                Equivalent of % movement for Python files
                            
                                Datetime string format alignment
                            
                                How to define metaclass for a class that extends from sqlalchemy declarative base
                            
                                Python NumPy log2 vs MATLAB
                            
                                Calculating the derivative of cumulative density function in Python
                            
                                Traceback in Smith-Wateman algorithm with affine gap penalty

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient nearest neighbour search for sparse matrices

Tags:

python

scipy

scikit-learn

nearest-neighbor

abhinavkulkarni

People also ask

2 Answers

Unapiedra

Mathieu

Recent Activity

Donate For Us