Euclidian Distance Python Implementation

Question

I am playing with the following code from programming collective intelligence, this is a function from the book that calculated eclidian distance between two movie critics.

This function sums the difference of the rankings in the dictionary, but euclidean distance in n dimensions also includes the square root of that sum.

AFAIK since we use the same function to rank everyone it does not matter we square root or not, but i was wondering is there a particular reason for that?


from math import sqrt 
# Returns a distance-based similarity score for person1 and person2 
def sim_distance(prefs,person1,person2): 
  # Get the list of shared_items 
  si={} 
  for item in prefs[person1]: 
    if item in prefs[person2]: 
       si[item]=1 
  # if they have no ratings in common, return 0 
  if len(si)==0: return 0 
  # Add up the squares of all the differences 
  sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) 
                      for item in prefs[person1] if item in prefs[person2]]) 
  return 1/(1+sum_of_squares)

Paul Sonier · Accepted Answer

The reason the square root is not used is because it is computationally expensive; it is monotonic (i.e., it preserves order) with the square function, so if all you're interested in is the order of the distances, the square root is unnecessary (and, as mentioned, very expensive computationally).

Reverend Gonzo · Answer

That's correct. While the square root is necessary for a quantitatively correct result, if all you care about is distance relative to others for sorting, then taking the square root is superfluous.

Euclidian Distance Python Implementation

Tags:

python

euclidean-distance

Hamza Yerlikaya

2 Answers

Paul Sonier

Reverend Gonzo

Recent Activity

Donate For Us

Euclidian Distance Python Implementation

Tags:

python

euclidean-distance

Hamza Yerlikaya

2 Answers

Paul Sonier

Reverend Gonzo

Related questions

Recent Activity

Donate For Us