Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation between multiple variables of a data frame

Tags:

r

correlation

I have a data.frame of 10 Variables in R. Lets call them var1 var2...var10

I want to find correlation of one of var1 with respect to var2, var3 ... var10

How can we do that?

cor function can find correlation between 2 variables at a time. By using that I had to write cor function for each Analysis

like image 875
Milind Kumar Avatar asked Jul 24 '16 05:07

Milind Kumar


2 Answers

My package corrr, which helps to explore correlations, has a simple solution for this. I'll use the mtcars data set as an example, and say we want to focus on the correlation of mpg with all other variables.

install.packages("corrr")  # though keep eye out for new version coming soon
library(corrr)
mtcars %>% correlate() %>% focus(mpg)


#>    rowname        mpg
#>      <chr>      <dbl>
#> 1      cyl -0.8521620
#> 2     disp -0.8475514
#> 3       hp -0.7761684
#> 4     drat  0.6811719
#> 5       wt -0.8676594
#> 6     qsec  0.4186840
#> 7       vs  0.6640389
#> 8       am  0.5998324
#> 9     gear  0.4802848
#> 10    carb -0.5509251

Here, correlate() produces a correlation data frame, and focus() lets you focus on the correlations of certain variables with all others.

FYI, focus() works similarly to select() from the dplyr package, except that it alters rows as well as columns. So if you're familiar with select(), you should find it easy to use focus(). E.g.:

mtcars %>% correlate() %>% focus(mpg:drat)

#>   rowname        mpg        cyl       disp         hp        drat
#>     <chr>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#> 1      wt -0.8676594  0.7824958  0.8879799  0.6587479 -0.71244065
#> 2    qsec  0.4186840 -0.5912421 -0.4336979 -0.7082234  0.09120476
#> 3      vs  0.6640389 -0.8108118 -0.7104159 -0.7230967  0.44027846
#> 4      am  0.5998324 -0.5226070 -0.5912270 -0.2432043  0.71271113
#> 5    gear  0.4802848 -0.4926866 -0.5555692 -0.1257043  0.69961013
#> 6    carb -0.5509251  0.5269883  0.3949769  0.7498125 -0.09078980
like image 74
Simon Jackson Avatar answered Sep 19 '22 13:09

Simon Jackson


I think better still, you could get the correlation, not just mapped one variable to all but all variables mapped to all others. You can do that easily with just one line of code. Using the pre-installed mtcars datasets.

library(dplyr)

cor(select(mtcars, mpg, wt, disp, drat, qsec, hp ))
like image 34
Oluwatoba Oyekanmi Avatar answered Sep 20 '22 13:09

Oluwatoba Oyekanmi