Is there a general way to convert between a measure of similarity and a measure of distance?
Consider a similarity measure like the number of 2-grams that two strings have in common.
2-grams('beta', 'delta') = 1
2-grams('apple', 'dappled') = 4
What if I need to feed this to an optimization algorithm that expects a measure of difference, like Levenshtein distance?
This is just an example...I'm looking for a general solution, if one exists. Like how to go from Levenshtein distance to a measure of similarity?
I appreciate any guidance you may offer.
When you are measuring by distance, the most closely related points will have the lowest distance, but when you are measuring by similarity, the most closely related points will have the highest similarity.
In data science, the similarity measure is a way of measuring how data samples are related or closed to each other. On the other hand, the dissimilarity measure is to tell how much the data objects are distinct. Moreover, these terms are often used in clustering when similar data samples are grouped into one cluster.
Similarity/Dissimilarity for Simple Attributesd(p, q) = d(q,p) for all p and q, d(p, r) ≤ d(p, q) + d(q, r) for all p, q, and r, where d(p, q) is the distance (dissimilarity) between points (data objects), p and q.
To calculate the similarity between two examples, you need to combine all the feature data for those two examples into a single numeric value. For instance, consider a shoe data set with only one feature: shoe size. You can quantify how similar two shoes are by calculating the difference between their sizes.
If your similarity measure (s) is between 0 and 1, you can use one of these:
1-s
sqrt(1-s)
-log(s)
(1/s)-1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With