Want to map Euclidean distance to the range [0, 1], somewhat like the cosine similarity of vectors.
For instance
input output
0 1.0
1 0.9 approximate
2 0.8 to 0.9 somewhere
inf 0.0
I tried the formula 1/(1+d)
, but that falls away from 1.0 too quickly.
To convert distance measure to similarity measure, we need to first normalize d to [0 1], by using d_norm = d/max(d). Then the similarity measure is given by: s = 1 - d_norm.
Cosine similarity can be seen as a method of normalizing document length during comparison. In the case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since the term frequencies cannot be negative.
The Euclidean distance corresponds to the L2-norm of a difference between vectors. The cosine similarity is proportional to the dot product of two vectors and inversely proportional to the product of their magnitudes.
It seems that you want the fraction's denominator to grow more slowly (the denominator is the bottom part, which you have as (d+1) so far). There are various ways to handle this. For instance, try a lower power for d, such as
1 / (1 + d**(0.25))
... or an exponential decay in the denominator, such as
1 / (1.1 ** d)
... or using a trig function to temper your mapping, such as
1 - tanh(d)
Would something in one of these families work for you?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With