I want to compute the similarity (distance) between two vectors: <pre class="prettyprint"><code>v1 <- c(1, 0.5, 0, 0.1) v2 <- c(0.7, 1, 0.2, 0.1) </code></pre> I just want to know if a package is available for calculating different well-known similarity (distance) measures in R? For example, "Resnik", "Lin", "Rel", "Jiang",... The implementation of these method is not hard, but I really think it must be defined in some packages in R. After some googling I found a package "GOSemSim", which contains most measures, but it's specific to the biomedical application and I can't use them for computing the similarity between two vectors.

The <code>dist</code> function supports via its <code>method</code> argument: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". See <code>?dist</code>

How to calculate different well-known similarity or distance measures between two vectors in R?

Tags:

similarity

I want to compute the similarity (distance) between two vectors:

v1 <- c(1, 0.5, 0, 0.1)
v2 <- c(0.7, 1, 0.2, 0.1)

I just want to know if a package is available for calculating different well-known similarity (distance) measures in R? For example, "Resnik", "Lin", "Rel", "Jiang",...

The implementation of these method is not hard, but I really think it must be defined in some packages in R.

After some googling I found a package "GOSemSim", which contains most measures, but it's specific to the biomedical application and I can't use them for computing the similarity between two vectors.

215

asked Jan 05 '14 02:01

2 Answers

"proxy" is a general library for distance and similarity measures. The following methods are supported:

"Jaccard" "Kulczynski1" "Kulczynski2" "Mountford" "Fager" "Russel" "simple matching" "Hamman" "Faith"
"Tanimoto" "Dice" "Phi" "Stiles" "Michael" "Mozley" "Yule" "Yule2" "Ochiai"
"Simpson" "Braun-Blanquet" "cosine" "eJaccard" "fJaccard" "correlation" "Chi-squared" "Phi-squared" "Tschuprow"
"Cramer" "Pearson" "Gower" "Euclidean" "Mahalanobis" "Bhjattacharyya" "Manhattan" "supremum" "Minkowski"
"Canberra" "Wave" "divergence" "Kullback" "Bray" "Soergel" "Levenshtein" "Podani" "Chord"
"Geodesic" "Whittaker" "Hellinger"

Check the following example:

x <- c(1,2,3,4,5)
y <- c(4,5,6,7,8)
l <- list(x, y)
simil(l, method="cosine")

The output is a similarity matrix between the elements of "l":

      1
2     0.978232

The only problem I have is that for some methods (such as: "Jaccard"), the following error is occurred:

simil(l, method="Jaccard")
Error in n - d : 'n' is missing

176

answered Oct 16 '22 17:10

Amir H. Jadidinejad

The dist function supports via its method argument: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". See ?dist

answered Oct 16 '22 17:10

G. Grothendieck

Related questions
                            
                                Call R functions from sqldf queries
                            
                                Collecting an unknown number of results in a loop
                            
                                getting predictor names in R regression
                            
                                Saving and loading history automatically
                            
                                counting occurrences in column and create variable in R
                            
                                avoid checking examples for R package building using devtools
                            
                                Error when exporting dataframe to text file in R
                            
                                String split in R with complex divisions
                            
                                Subset all levels of a single factor
                            
                                Aggregation by time period in lubridate
                            
                                Class of a sequence of numbers
                            
                                How to aggregate (using "by") a data.table with customized column name without ":="?
                            
                                read.xlsx and colClasses
                            
                                Find most frequent combination of values in a data.frame
                            
                                Custom package using parallel or doParallel for multiple OS as a CRAN package
                            
                                `j` doesn't evaluate to the same number of columns for each group
                            
                                Applying a function to every combination of elements in a vector
                            
                                Export R object for 3D printing
                            
                                How do I add a column to each data frame in a list
                            
                                Dynamic Variable naming in r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate different well-known similarity or distance measures between two vectors in R?

Tags:

r

measure

distance

similarity

Amir H. Jadidinejad

People also ask

2 Answers

Amir H. Jadidinejad

G. Grothendieck

Recent Activity

Donate For Us