Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the pair of most correlated variables

Suppose I have a data frame consisting of 20 columns (variables) and all of them are numeric. I can always use the cor function in R to get the correlation coefficients in matrix form or actually visualize the correlation matrix (with correlation coefficients labeled as well). Suppose I just want to sort the pairs according to the correlation coefficients value, how to do this in R ?

like image 236
your_boy_gorja Avatar asked Sep 19 '17 19:09

your_boy_gorja


People also ask

How do you find highly correlated variables?

If the value is 0, the two variables are independent and there is no correlation. If the measure is extremely close to one of these values, it indicates a linear relationship and highly correlated with each other. This means a change in one variable is associated with a significant change in other variables.

How do you find the correlation between pairs?

The Pearson's correlation coefficient is calculated as the covariance of the two variables divided by the product of the standard deviation of each data sample. It is the normalization of the covariance between the two variables to give an interpretable score.

How do you know if two variables are highly correlated?

Correlation coefficients whose magnitude are between 0.9 and 1.0 indicate variables which can be considered very highly correlated. Correlation coefficients whose magnitude are between 0.7 and 0.9 indicate variables which can be considered highly correlated.

Which variables have the highest correlation?

The variables with correlation coefficient values closer to 1 show a strong positive correlation, the values closer to -1 show a strong negative correlation, and the values closer to 0 show weak or no correlation.


1 Answers

Solution using corrr:

corrr is a package for exploring correlations in R. It focuses on creating and working with data frames of correlations

library(corrr)
matrix(rnorm(100), 5) %>%
    correlate() %>% 
    stretch() %>% 
    arrange(r)

Solution using reshape2 & data.table:

You can reshape2::melt (imported with data.table) cor result and order (sort) according correlation values.

library(data.table)
corMatrix <- cor(matrix(rnorm(100), 5))
setDT(melt(corMatrix))[order(value)]
like image 90
pogibas Avatar answered Oct 10 '22 17:10

pogibas