I have a grouped data frame (using dplyr
) with 50 numeric columns, which are split into groups using one of the columns. I want to calculate a matrix of correlation between all non grouping columns and one particular column.
An example with the mtcars
dataset:
data(mtcars)
cor(mtcars[,2:11], mtcars[,2])
returns a list of correlations between miles per galleon and the other variables.
Let's say however, that I wish to calculate this same correlation for each group of cylinders, e.g.:
library(dplyr)
mtcars <-
mtcars %>%
group_by(cyl)
How would I do this? I am thinking something like
mtcars %>%
group_by(cyl) %>%
summarise_each(funs(cor(...))
But I do not know what to put in the ...
as I don't know how to specify a column in the dplyr
chain.
Related:
Linear model and dplyr - a better solution? has an answer which is very similar to @akrun's answer. Also, over on cross validated: https://stats.stackexchange.com/questions/4040/r-compute-correlation-by-group has other solutions using packages which are not dplyr
.
We could use do
.
library(dplyr)
mtcars %>%
group_by(cyl) %>%
do(data.frame(Cor=t(cor(.[,3:11], .[,3]))))
# A tibble: 3 x 10
# Groups: cyl [3]
# cyl Cor.disp Cor.hp Cor.drat Cor.wt Cor.qsec Cor.vs Cor.am Cor.gear Cor.carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 4 1.00 0.435 -0.500 0.857 0.328 -0.187 -0.734 -0.0679 0.490
#2 6 1.00 -0.514 -0.831 0.473 0.789 0.637 -0.637 -0.899 -0.942
#3 8 1 0.118 -0.0922 0.755 0.195 NA -0.169 -0.169 0.0615
NOTE: t
part is contributed by @Alex
Or use group_modify
mtcars %>%
select(-mpg) %>%
group_by(cyl) %>%
group_modify(.f = ~ as.data.frame(t(cor(select(.x, everything()),
.x[['disp']]))))
# A tibble: 3 x 10
# Groups: cyl [3]
# cyl disp hp drat wt qsec vs am gear carb
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 4 1.00 0.435 -0.500 0.857 0.328 -0.187 -0.734 -0.0679 0.490
#2 6 1.00 -0.514 -0.831 0.473 0.789 0.637 -0.637 -0.899 -0.942
#3 8 1 0.118 -0.0922 0.755 0.195 NA -0.169 -0.169 0.0615
Or another option is summarise
with across
. Created a new column 'disp1' as 'disp' then grouped by 'cyl', get the cor
of columns 'disp' to 'carb' with 'disp1'
mtcars %>%
mutate(disp1 = disp) %>%
group_by(cyl) %>%
summarise(across(disp:carb, ~ cor(., disp1)))
# A tibble: 3 x 10
# cyl disp hp drat wt qsec vs am gear carb
#* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 4 1.00 0.435 -0.500 0.857 0.328 -0.187 -0.734 -0.0679 0.490
#2 6 1.00 -0.514 -0.831 0.473 0.789 0.637 -0.637 -0.899 -0.942
#3 8 1 0.118 -0.0922 0.755 0.195 NA -0.169 -0.169 0.0615
Or
library(data.table)
d1 <- copy(mtcars)
setnames(setDT(d1)[, as.list(cor(.SD, .SD[[1]])) , cyl,
.SDcols=3:11], names(d1)[2:11])[]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With