Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

correlation matrix of a bunch of categorical variables in R

I have about 20 variables about different cities labeled "Y" or "N" and are factors. The variables are like "has co-op" and the such. I want to find some correlations and possibly use the corrplot package to display the connections between all these variables. But for some reason I cannot coerce the variables so that they are read in a way corrplot or even cor() likes so that I can get them in a matrix. I tried:

 M <- cor(model.matrix(~.-1,data=mydata[c(25:44)]))

but the results in corrplot came out really weird. Does anyone have a fast way to turn a bunch of Y/N answers into a correlation matrix? Thanks!

like image 557
Logan McDonald Avatar asked Jul 06 '15 05:07

Logan McDonald


1 Answers

You can use the sjp.corr function or sjt.corr function for graphical or tabular output, both from the sjPlot-package.

DF <- data.frame(v1 = sample(c("Y","N"), 100, T),
                 v2 = sample(c("Y","N"), 100, T),
                 v3 = sample(c("Y","N"), 100, T),
                 v4 = sample(c("Y","N"), 100, T),
                 v5 = sample(c("Y","N"), 100, T))
DF[] <- lapply(DF,as.integer)
library(sjPlot)
sjp.corr(DF)
sjt.corr(DF)

The plot:

enter image description here

The table (in RStudio viewer pane):

enter image description here

You can use many parameters to modify the appearance of the plot or table, see some examples here.

like image 98
Daniel Avatar answered Sep 23 '22 07:09

Daniel