Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementation of locality-sensitive hashing with min-hash

I have read a lot of tutorials, documents, and pieces of code implementing LSH (locality-sensitive hashing) with min-hash.

LSH tries to find the Jaccard coefficient of two sets by hashing random subsets and aggregating over those. I have looked at implementations in code.google.com but was not able to understand their method as well. I understand the paper Google news personalization: scalable online collaborative filtering, but I fail to understand any of the implementations out there.

Can someone please explain me in simple words how to implement LSH with MinHash?

like image 247
Majic Johnson Avatar asked Jan 07 '13 21:01

Majic Johnson


1 Answers

You want to implement the min-hash algorithm but not LSH per se. Min-hashing is an LSH technique. Thus, LSH, in general, does not approximate the Jaccard coefficient, the particular method of min-hashing does.

An introduction is given in Mining of Massive Datasets, Chapter 3 by Anand Rajaraman and Jeff Ullman.

like image 98
Stefan Avatar answered Oct 26 '22 09:10

Stefan