I have a distance matrix:
> mat
hydrogen helium lithium beryllium boron
hydrogen 0.000000 2.065564 3.940308 2.647510 2.671674
helium 2.065564 0.000000 2.365661 1.697749 1.319400
lithium 3.940308 2.365661 0.000000 3.188148 2.411567
beryllium 2.647510 1.697749 3.188148 0.000000 2.499369
boron 2.671674 1.319400 2.411567 2.499369 0.000000
And a data frame:
> results
El1 El2 Score
Helium Hydrogen 92
Boron Helium 61
Boron Lithium 88
I want to calculate all the pairwise distances between the words in results$El1
and results$El2
to get the following:
> results
El1 El2 Score Dist
Helium Hydrogen 92 2.065564
Boron Helium 61 1.319400
Boron Lithium 88 2.411567
I did this with a for loop but it seems really clunky. Is there a more elegant way to search and extract distances with fewer lines of code?
Here is my current code:
names = row.names(mat)
num.results <- dim(results)[1]
El1 = match(results$El1, names)
El2 = match(results$El2, names)
el.dist <- matrix(0, num.results, 1)
for (i1 in c(1:num.results)) {
el.dist[i1, 1] <- mat[El1[i1], El2[i1]]
}
results$Dist = el.dist[,1]
cols <- match(tolower(results$El1), colnames(mat))
rows <- match(tolower(results$El2), colnames(mat))
results$Dist <- mat[cbind(rows, cols)]
results
El1 El2 Score Dist
1 Helium Hydrogen 92 2.065564
2 Boron Helium 61 1.319400
3 Boron Lithium 88 2.411567
You'll recognize most of the code. The one to focus on is mat[cbind(rows, cols)]
. With matrices, we are allowed to subset by another matrix with the same number of columns as dimensions. From the ?`[`
help:
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With