I have read a lot of tutorials, documents, and pieces of code implementing LSH (locality-sensitive hashing) with min-hash.
LSH tries to find the Jaccard coefficient of two sets by hashing random subsets and aggregating over those. I have looked at implementations in code.google.com but was not able to understand their method as well. I understand the paper Google news personalization: scalable online collaborative filtering, but I fail to understand any of the implementations out there.
Can someone please explain me in simple words how to implement LSH with MinHash?
You want to implement the min-hash algorithm but not LSH per se. Min-hashing is an LSH technique. Thus, LSH, in general, does not approximate the Jaccard coefficient, the particular method of min-hashing does.
An introduction is given in Mining of Massive Datasets, Chapter 3 by Anand Rajaraman and Jeff Ullman.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With