Actually there are 2 questions, one is more advanced than the other. <h3>Q1: I am looking for a method that similar to <code>corrplot()</code> but can deal with factors.</h3> I originally tried to use <code>chisq.test()</code> then calculate the p-value and Cramer's V as correlation, but there too many columns to figure out. So could anyone tell me if there is a quick way to create a "corrplot" that each cell contains the value of Cramer's V, while the colour is rendered by p-value. Or any other kind of similar plot. Regarding Cramer's V, let's say <code>tbl</code> is a 2-dimensional factor data frame. <pre class="prettyprint"><code>chi2 <- chisq.test(tbl, correct=F) Cramer_V <- sqrt(chi2$/nrow(tbl)) </code></pre> I prepared a test data frame with factors: <pre class="prettyprint"><code>df <- data.frame( group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'), student = c('01', '01', '01', '02', '02', '01', '02'), exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'), subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math') ) </code></pre> <h3>Q2: Then I would like to compute a correlation/association matrix on a mixed-types dataframe e.g.:</h3> <pre class="prettyprint"><code>df <- data.frame( group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'), student = c('01', '01', '01', '02', '02', '01', '02'), exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'), subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math') ) df$group <- factor(df$group, levels = c('A', 'B', 'C'), ordered = T) df$student <- as.integer(df$student) </code></pre>

If you want to have a genuine correlation plot for factors or mixed-type, you can also use <code>model.matrix</code> to one-hot encode all non-numeric variables. This is quite different than calculating Cramér's V as it will consider your factor as separate variables, as many regression models do. You can then use your favorite correlation-plot library. I personally like <code>ggcorrplot</code> for its <code>ggplot2</code> compatibility. Here is an example with your dataset: <pre class="prettyprint"><code>library(ggcorrplot) model.matrix(~0+., data=df) %>% cor(use="pairwise.complete.obs") %>% ggcorrplot(show.diag = F, type="lower", lab=TRUE, lab_size=2) </code></pre> <img src="https://i.stack.imgur.com/AKBte.png" alt="enter image description here">

Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

Q1: I am looking for a method that similar to `corrplot()` but can deal with factors.

I originally tried to use chisq.test() then calculate the p-value and Cramer's V as correlation, but there too many columns to figure out. So could anyone tell me if there is a quick way to create a "corrplot" that each cell contains the value of Cramer's V, while the colour is rendered by p-value. Or any other kind of similar plot.

Regarding Cramer's V, let's say tbl is a 2-dimensional factor data frame.

chi2 <- chisq.test(tbl, correct=F)
Cramer_V <- sqrt(chi2$/nrow(tbl))

I prepared a test data frame with factors:

df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
)

Q2: Then I would like to compute a correlation/association matrix on a mixed-types dataframe e.g.:

df <- data.frame(
group = c('A', 'A', 'A', 'A', 'A', 'B', 'C'),
student = c('01', '01', '01', '02', '02', '01', '02'),
exam_pass = c('Y', 'N', 'Y', 'N', 'Y', 'Y', 'N'),
subject = c('Math', 'Science', 'Japanese', 'Math', 'Science', 'Japanese', 'Math')
) 
df$group <- factor(df$group, levels = c('A', 'B', 'C'), ordered = T)
df$student <- as.integer(df$student)

894

asked Sep 28 '18 11:09

J.D

1 Answers

If you want to have a genuine correlation plot for factors or mixed-type, you can also use model.matrix to one-hot encode all non-numeric variables. This is quite different than calculating Cramér's V as it will consider your factor as separate variables, as many regression models do.

You can then use your favorite correlation-plot library. I personally like ggcorrplot for its ggplot2 compatibility.

Here is an example with your dataset:

library(ggcorrplot)
model.matrix(~0+., data=df) %>% 
  cor(use="pairwise.complete.obs") %>% 
  ggcorrplot(show.diag = F, type="lower", lab=TRUE, lab_size=2)

enter image description here

answered Sep 22 '22 23:09

Dan Chaltiel

Related questions
                            
                                R: Launch web browser
                            
                                ggplot2 annotation with superscripts
                            
                                elementwise combination of two lists in R
                            
                                how to realize countifs function (excel) in R
                            
                                How to plot a contour line showing where 95% of values fall within, in R and in ggplot2
                            
                                How to remove groups of observation with dplyr::filter()
                            
                                Detect multiple strings with dplyr and stringr
                            
                                "Density" curve overlay on histogram where vertical axis is frequency (aka count) or relative frequency?
                            
                                R : Check if R object exists before creating it
                            
                                calculate median from data.table columns in R
                            
                                R - Autofit Excel column width
                            
                                can lapply not modify variables in a higher scope
                            
                                Existing function for seeing if a row exists in a data frame?
                            
                                How to get R plot window size?
                            
                                How to prevent regmatches drop non matches?
                            
                                Change font size of titles from facet_wrap
                            
                                How to rbind only the common columns of two data sets
                            
                                How to retrieve the most repeated value in a column present in a data frame
                            
                                Pretty axis labels for log scale in ggplot
                            
                                Keep column name when select one column from a data frame/matrix in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

Tags:

plot

r

statistics

correlation

chi-squared

Q1: I am looking for a method that similar to `corrplot()` but can deal with factors.

Q2: Then I would like to compute a correlation/association matrix on a mixed-types dataframe e.g.:

J.D

People also ask

1 Answers

Dan Chaltiel

Recent Activity

Donate For Us

Plot the equivalent of correlation matrix for factors (categorical data)? And mixed types?

Tags:

plot

r

statistics

correlation

chi-squared

Q1: I am looking for a method that similar to corrplot() but can deal with factors.

Q2: Then I would like to compute a correlation/association matrix on a mixed-types dataframe e.g.:

J.D

People also ask

1 Answers

Dan Chaltiel

Related questions

Recent Activity

Donate For Us

Q1: I am looking for a method that similar to `corrplot()` but can deal with factors.