Locality Sensitive Hashing of sparse numpy arrays

1 Answers

If you have very large sparse datasets that are too large to be held in memory in a non-sparse format, I'd try out this LSH implementation that is built around the assumption of Scipy's CSR Sparse Matrices:

https://github.com/brandonrobertz/SparseLSH

It also hash support for disk-based key-value stores like LevelDB if you can't fit the tables in memory. From the docs:

Click to copy

from sparselsh import LSH
from scipy.sparse import csr_matrix

X = csr_matrix( [
    [ 3, 0, 0, 0, 0, 0, -1],
    [ 0, 1, 0, 0, 0, 0,  1],
    [ 1, 1, 1, 1, 1, 1,  1] ])

# One class number for each input point
y = [ 0, 3, 10]

X_sim = csr_matrix( [ [ 1, 1, 1, 1, 1, 1, 0]])

lsh = LSH( 4,
           X.shape[1],
           num_hashtables=1,
           storage_config={"dict":None})

for ix in xrange(X.shape[0]):
    x = X.getrow(ix)
    c = y[ix]
    lsh.index( x, extra_data=c)

# find points similar to X_sim
lsh.query(X_sim, num_results=1)

If you definitely only want to use MinHash, you could try out https://github.com/go2starr/lshhdc, but I haven't personally tested that one out for compatibility with sparse matrices.

109

answered Oct 11 '22 01:10

COOLZXxX

Related questions
                            
                                How to parse and simplify a string like '3cm/µs² + 4e-4 sqmiles/km/h**2' treating physical units correctly?
                            
                                matplotlib - duplicate plot from one figure to another?
                            
                                Maintaining a ratio when splitting up data in python function
                            
                                Python time.sleep vs busy wait accuracy
                            
                                Python deployment with virtualenv (on a no-internet-access server)
                            
                                How do I case fold a string in Python 2?
                            
                                How do I perform a convolution in python with a variable-width Gaussian?
                            
                                First steps with Celery using a virtualenv
                            
                                Can python abstract base classes inherit from C extensions?
                            
                                Unknown queue names show on Rabbitmq mgmt. when using Celery
                            
                                Why mesh python code slower than decomposed one?
                            
                                How to call a Perl function in Python
                            
                                File tests in Python? [duplicate]
                            
                                Heatmap with matplotlib using matshow
                            
                                colander schema for mapping where keys are variable but value are arrays
                            
                                Pb converting a list of pandas.Series into a numpy array of pandas.Series
                            
                                python pandas create dataframe and force multiple column types
                            
                                Colorplot that distinguishes between positive and negative values
                            
                                Is it pythonic to use __init__.py modules of a package for generic, abstract classes?
                            
                                Why does python multiprocessing pickle objects to pass objects between processes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Locality Sensitive Hashing of sparse numpy arrays

Tags:

python

numpy

scipy

locality-sensitive-hash

utdiscant

People also ask

1 Answers

COOLZXxX

Recent Activity

Donate For Us