Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert between a measure of similarity and a measure of difference (distance)?

Is there a general way to convert between a measure of similarity and a measure of distance?

Consider a similarity measure like the number of 2-grams that two strings have in common.

2-grams('beta', 'delta') = 1
2-grams('apple', 'dappled') = 4

What if I need to feed this to an optimization algorithm that expects a measure of difference, like Levenshtein distance?

This is just an example...I'm looking for a general solution, if one exists. Like how to go from Levenshtein distance to a measure of similarity?

I appreciate any guidance you may offer.

like image 367
135498 Avatar asked Oct 31 '10 19:10

135498


People also ask

What is the difference between similarity and distance measures?

When you are measuring by distance, the most closely related points will have the lowest distance, but when you are measuring by similarity, the most closely related points will have the highest similarity.

What is the difference between dissimilarity measure and similarity measure?

In data science, the similarity measure is a way of measuring how data samples are related or closed to each other. On the other hand, the dissimilarity measure is to tell how much the data objects are distinct. Moreover, these terms are often used in clustering when similar data samples are grouped into one cluster.

How do you calculate similarity and dissimilarity?

Similarity/Dissimilarity for Simple Attributesd(p, q) = d(q,p) for all p and q, d(p, r) ≤ d(p, q) + d(q, r) for all p, q, and r, where d(p, q) is the distance (dissimilarity) between points (data objects), p and q.

How do you measure similarity?

To calculate the similarity between two examples, you need to combine all the feature data for those two examples into a single numeric value. For instance, consider a shoe data set with only one feature: shoe size. You can quantify how similar two shoes are by calculating the difference between their sizes.


1 Answers

If your similarity measure (s) is between 0 and 1, you can use one of these:

1-s
sqrt(1-s)
-log(s)
(1/s)-1
like image 108
nimcap Avatar answered May 13 '23 19:05

nimcap