I want to compute the similarity (distance) between two vectors:
v1 <- c(1, 0.5, 0, 0.1)
v2 <- c(0.7, 1, 0.2, 0.1)
I just want to know if a package is available for calculating different well-known similarity (distance) measures in R? For example, "Resnik", "Lin", "Rel", "Jiang",...
The implementation of these method is not hard, but I really think it must be defined in some packages in R.
After some googling I found a package "GOSemSim", which contains most measures, but it's specific to the biomedical application and I can't use them for computing the similarity between two vectors.
Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.
Similarity/Dissimilarity for Simple Attributesd(p, q) = d(q,p) for all p and q, d(p, r) ≤ d(p, q) + d(q, r) for all p, q, and r, where d(p, q) is the distance (dissimilarity) between points (data objects), p and q.
Pearson correlation. Pearson correlation is widely used in clustering gene expression data [33,36,40]. This similarity measure calculates the similarity between the shapes of two gene expression patterns.
When you are measuring by distance, the most closely related points will have the lowest distance, but when you are measuring by similarity, the most closely related points will have the highest similarity.
"proxy" is a general library for distance and similarity measures. The following methods are supported:
"Jaccard" "Kulczynski1" "Kulczynski2" "Mountford" "Fager" "Russel" "simple matching" "Hamman" "Faith"
"Tanimoto" "Dice" "Phi" "Stiles" "Michael" "Mozley" "Yule" "Yule2" "Ochiai"
"Simpson" "Braun-Blanquet" "cosine" "eJaccard" "fJaccard" "correlation" "Chi-squared" "Phi-squared" "Tschuprow"
"Cramer" "Pearson" "Gower" "Euclidean" "Mahalanobis" "Bhjattacharyya" "Manhattan" "supremum" "Minkowski"
"Canberra" "Wave" "divergence" "Kullback" "Bray" "Soergel" "Levenshtein" "Podani" "Chord"
"Geodesic" "Whittaker" "Hellinger"
Check the following example:
x <- c(1,2,3,4,5)
y <- c(4,5,6,7,8)
l <- list(x, y)
simil(l, method="cosine")
The output is a similarity matrix between the elements of "l":
1
2 0.978232
The only problem I have is that for some methods (such as: "Jaccard"), the following error is occurred:
simil(l, method="Jaccard")
Error in n - d : 'n' is missing
The dist
function supports via its method
argument: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". See ?dist
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With