I have a data frame d
, it has 3 columns, that are s
, n
, id
and I need to calculate correlation between "s" and "n" based on their "id". Like for eg data frame:
"s" "n" "id"
1.6 0.5 2
2.5 0.8 2
4.8 0.7 3
2.6 0.4 3
3.5 0.66 3
1.2 0.1 4
2.5 0.45 4
So, I want to calcualte correlation of 2's, 3's and 4's and return it as a vector like:
cor
0.18 0.45 0.65
My problem is how to choose these id's and calculate correlation and return in the form of a vector.
Thank you
Here's a dplyr approach:
library(dplyr)
group_by(df, id) %>% summarise(corel = cor(s, n)) %>% .$corel
#[1] 1.000000 0.875128 1.000000
tab_split<-split(mydf,mydf$id) # get a list where each element is a subset of your data.frame with the same id
unlist(lapply(tab_split,function(tab) cor(tab[,1],tab[,2]))) # get a vector of correlation coefficients
with the sample you gave :
mydf<-structure(list(s = c(1.6, 2.5, 4.8, 2.6, 3.5, 1.2, 2.5),
n = c(0.5,0.8, 0.7, 0.4, 0.66, 0.1, 0.45),
id = c(2L, 2L, 3L, 3L, 3L, 4L,4L)),
.Names = c("s", "n", "id"),
class = "data.frame",
row.names = c(NA, -7L))
> unlist(lapply(tab_split,function(tab) cor(tab[,1],tab[,2])))
2 3 4
1.000000 0.875128 1.000000
NB: if your column names are always "n" and "s", you can also do
unlist(lapply(tab_split,function(tab) cor(tab$s,tab$n)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With