Actually there are 2 questions, one is more advanced than the other.
corrplot()
but can deal with factors.I originally tried to use chisq.test()
then calculate the p-value and Cramer's V as correlation, but there too many columns to figure out.
So could anyone tell me if there is a quick way to create a "corrplot" that each cell contains the value of Cramer's V, while the colour is rendered by p-value. Or any other kind of similar plot.
Regarding Cramer's V, let's say tbl
is a 2-dimensional factor data frame.
chi2 <- chisq.test(tbl, correct=F)
Cramer_V <- sqrt(chi2$/nrow(tbl))
I prepared a test data frame with factors:
df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)
df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)
df$group <- factor(df$group, levels = c('A', 'B', 'C'), ordered = T)
df$student <- as.integer(df$student)
Yes, it is possible if you also keep the variable type in a column and you pick the appropriate correlation method based on the types.
There are three big-picture methods to understand if a continuous and categorical are significantly correlated — point biserial correlation, logistic regression, and Kruskal Wallis H Test. The point biserial correlation coefficient is a special case of Pearson's correlation coefficient.
Plotting Correlation Matrix First, find the correlation between each variable available in the dataframe using the corr() method. The corr() method will give a matrix with the correlation values between each variable. Now, set the background gradient for the correlation data.
Correlation Matrix of Categorical Variables Only. To generate the correlation matrix for only categorical variables, We are going to filter out all the categorical variables in a separate data frame.
If you want to have a genuine correlation plot for factors or mixed-type, you can also use model.matrix
to one-hot encode all non-numeric variables. This is quite different than calculating Cramér's V as it will consider your factor as separate variables, as many regression models do.
You can then use your favorite correlation-plot library. I personally like ggcorrplot
for its ggplot2
compatibility.
Here is an example with your dataset:
library(ggcorrplot)
model.matrix(~0+., data=df) %>%
cor(use="pairwise.complete.obs") %>%
ggcorrplot(show.diag = F, type="lower", lab=TRUE, lab_size=2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With