Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Locality Sensitive Hashing - finding probabilities and values for R

Thanks to those who've answered my previous questions and gotten me this far.

I have a table of about 25,000 vectors, each with 48 dimensions, with values ranging from 0-255.

I am attempting to develop a Locality Sensitive Hash (http://en.wikipedia.org/wiki/Locality-sensitive_hashing) algorithm for finding a near neighbor or closest neighbor points.

My current LSH function is this:

def lsh(vector, r = 1.0, a = None, b = None):
    if not a:
        a = [normalvariate(10, 4) for i in range(48)]
    if not b:
        b = uniform(0, r)
    hashVal = floor((sum([a[i]*vector[i] for i in range(48)]) + b)/r)
    return int(hashVal)

My questions at this point are:

A: Are there optimal values for "normalvariate(10, 4)" portion of my code. This is pythons built in random.normalvariate (http://docs.python.org/library/random.html#random.normalvariate) function which I am using to produce the "d dimensional vector with entries chosen independently from a stable distribution". From my experimenting, the values don't seem to matter too much.

B: At the top of the wikipedia article it states:

if d(p,q) <= R, then h(p) = h(q) with probability at least P1

if d(p,q) >= cR, then h(p) = h(q) with probability at most P2

Is the R value mentioned here also the R value mentioned under the Stable Distributions section. (http://en.wikipedia.org/wiki/Locality-sensitive_hashing#Stable_distributions)

C: Related to my previous question(B). I've discovered that using higher values of R in my hasing function map my vectors into a smaller range of hash values. Is there a way to optimize my R value.

D: Approximately how many tables might one use?

like image 431
Piper Merriam Avatar asked Oct 14 '22 02:10

Piper Merriam


1 Answers

You might want to check out "MetaOptimize" -- like Stack Overflow for machine learning.
http://metaoptimize.com/qa

Your question isn't really a python or programing question, and that community might be able to help a bit more.

like image 170
Jon Avatar answered Oct 18 '22 13:10

Jon