Pearson Similarity Score, how can I optimise this further?

Question

I have an implemented of Pearson's Similarity score for comparing two dictionaries of values. More time is spent in this method than anywhere else (potentially many millions of calls), so this is clearly the critical method to optimise.

Even the slightest optimisation could have a big impact on my code, so I'm keen to explore even the smallest improvements.

Here's what I have so far:

def simple_pearson(v1,v2):

    si = [val for val in v1 if val in v2]

    n = len(si)

    if n==0: return 0.0

    sum1 = 0.0
    sum2 = 0.0
    sum1_sq = 0.0
    sum2_sq = 0.0
    p_sum = 0.0

    for v in si:
        val_1 = v1[v]
        val_2 = v2[v]
        sum1+=val_1
        sum2+=val_2
        sum1_sq+=pow(val_1,2)
        sum2_sq+=pow(val_2,2)
        p_sum+=val_1*val_2

    # Calculate Pearson score
    num = p_sum-(sum1*sum2/n)
    temp = (sum1_sq-pow(sum1,2)/n) * (sum2_sq-pow(sum2,2)/n)
    if temp < 0.0:
        temp = -temp
    den = sqrt(temp)
    if den==0: return 1.0

    r = num/den

    return r

Alex Martelli · Accepted Answer

The real speed increase would be gained by moving to numpy or scipy. Short of that, there are microoptimizations: e.g. x*x is faster than pow(x,2); you could extract the values at the same time as the keys by doing, instead of:

si = [val for val in v1 if val in v2]

something like

vs = [ (v1[val],v2[val]) for val in v1 if val in v2]

and then

sum1 = sum(x for x, y in vs)

and so on; whether each of these brings time advantage needs microbenchmarking. Depending on how you're using these coefficients returning the square would save you a sqrt (that's a similar idea to using squares of distances between points, in geometry, rather than the distances themselves, and for the same reason -- saves you a sqrt; which makes sense because the coefficient IS a distance, kinda...;-).

Pearson Similarity Score, how can I optimise this further?

Tags:

python

optimization

similarity

pearson

Andrew Ingram

1 Answers

Alex Martelli

Recent Activity

Donate For Us

Pearson Similarity Score, how can I optimise this further?

Tags:

python

optimization

similarity

pearson

Andrew Ingram

1 Answers

Alex Martelli

Related questions

Recent Activity

Donate For Us