Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to display only the top 10 strongest correlated variables with corrplot() in R?

I have > 100 variables and would like to understand how they are correlated with each other. I would like to do this using the corrplot() function from the corrplot package.

corrplot() offers the option to order the displayed variables so that the most strongly correlated variables get displayed in the top right of the corrplot. The parameter order="hclust" has to be set to achieve this:

library(corrplot)
corrplot(cor(df), order="hclust", type="upper") # df = data.frame object

Problem: The corrplot will contain all > 100 variables and is hence not readable. Therefore, I am looking for a way to display the top 10 strongest correlated variables in a corrplot, then the top 11-20 in another corrplot, etc. I am grateful for your tips and advice. Thanks a lot in advance.

like image 553
jollycat Avatar asked Nov 15 '25 19:11

jollycat


1 Answers

Although I'm one year late, I'll leave this here in case someone else needs this simple and beautiful solution:

Install lares from GitHub

devtools::install_github("laresbernardo/lares")

Barchart with top correlations in the dataset

library(lares) 
corr_cross(data_frame, # dataset
           max_pvalue = 0.05, # show only sig. correlations at selected level
           top = 10 # display top 10 correlations, any couples of variables  )

Barchart with top correlations focused on only one variable (happy)

corr_var(data_frame, # dataset
         happy, # name of variable to focus on
         top = 10 # display top 10 correlations )
like image 177
AriadnaAgnis Avatar answered Nov 18 '25 10:11

AriadnaAgnis