I have > 100 variables and would like to understand how they are correlated with each other. I would like to do this using the corrplot() function from the corrplot package.
corrplot() offers the option to order the displayed variables so that the most strongly correlated variables get displayed in the top right of the corrplot. The parameter order="hclust" has to be set to achieve this:
library(corrplot)
corrplot(cor(df), order="hclust", type="upper") # df = data.frame object
Problem: The corrplot will contain all > 100 variables and is hence not readable. Therefore, I am looking for a way to display the top 10 strongest correlated variables in a corrplot, then the top 11-20 in another corrplot, etc. I am grateful for your tips and advice. Thanks a lot in advance.
Although I'm one year late, I'll leave this here in case someone else needs this simple and beautiful solution:
Install lares from GitHub
devtools::install_github("laresbernardo/lares")
Barchart with top correlations in the dataset
library(lares)
corr_cross(data_frame, # dataset
max_pvalue = 0.05, # show only sig. correlations at selected level
top = 10 # display top 10 correlations, any couples of variables )
Barchart with top correlations focused on only one variable (happy)
corr_var(data_frame, # dataset
happy, # name of variable to focus on
top = 10 # display top 10 correlations )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With