Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Euclidian Distance Python Implementation

I am playing with the following code from programming collective intelligence, this is a function from the book that calculated eclidian distance between two movie critics.

This function sums the difference of the rankings in the dictionary, but euclidean distance in n dimensions also includes the square root of that sum.

AFAIK since we use the same function to rank everyone it does not matter we square root or not, but i was wondering is there a particular reason for that?


from math import sqrt 
# Returns a distance-based similarity score for person1 and person2 
def sim_distance(prefs,person1,person2): 
  # Get the list of shared_items 
  si={} 
  for item in prefs[person1]: 
    if item in prefs[person2]: 
       si[item]=1 
  # if they have no ratings in common, return 0 
  if len(si)==0: return 0 
  # Add up the squares of all the differences 
  sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) 
                      for item in prefs[person1] if item in prefs[person2]]) 
  return 1/(1+sum_of_squares) 
like image 566
Hamza Yerlikaya Avatar asked Nov 10 '09 17:11

Hamza Yerlikaya


2 Answers

The reason the square root is not used is because it is computationally expensive; it is monotonic (i.e., it preserves order) with the square function, so if all you're interested in is the order of the distances, the square root is unnecessary (and, as mentioned, very expensive computationally).

like image 86
Paul Sonier Avatar answered Oct 18 '22 17:10

Paul Sonier


That's correct. While the square root is necessary for a quantitatively correct result, if all you care about is distance relative to others for sorting, then taking the square root is superfluous.

like image 45
Reverend Gonzo Avatar answered Oct 18 '22 18:10

Reverend Gonzo