I have a data frame made by row binding many data frames, each identified with a unique key. I wish to calculate the correlation coefficients for columns in each subset (using the unique key) of the big data frame. For example, using the mtcars data I might want to calculate the correlation between columns hp
and wt
for each unique value in column cyl
. I could do it in a loop
data("mtcars")
for(i in c(4,6,8)){
temp = subset(mtcars,mtcars$cyl==i)
cor(temp$hp,temp$wt)
}
I think aggregate would be better, but this code doesn't work:
data("mtcars")
aggregate(mtcars,by=mycars$cyl,cor)
You could use
data("mtcars")
library(plyr)
ddply(mtcars, "cyl", function(x) cor(x$hp, x$wt))
This splits the data in mtcars
by cyl
, applies for each subset x
the function cor(x$hp, x$wt)
and then aggregates the results for each of the subsets in a data.frame.
I can highly recommend the plyr
package. It's one of the packages I use most in R.
Edit: As per request, here a dplyr
version. I have to say that I am not a big dplyr
user, but the code should be ok.
library(dplyr)
mtcars %>% group_by(cyl) %>% summarise(V1=cor(hp, wt))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With