Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chi-square statistic across multiple columns of a dataframe using dplyr or reshape2

I have a question about using dplyr and reshape2 to calculate chi-square statistics across multiple columns. Below is a small dataframe...

Sat <- c("Satisfied","Satisfied","Dissatisfied","Dissatisfied",
                                       "Neutral")

Gender <- c("Male","Male","Female","Male","Female")

Ethnicity <- c("Asian","White","White","Asian","White")

AgeGroup <- c("18-20","18-20","21-23","18-20","18-28")

Example <- data.frame(Sat,Gender,Ethnicity,AgeGroup)

How would I use summarise_each or melt to calculate the Sat column against each of the other variables to produce chi-square residual and p-value stats. I'm thinking there must be something like:

Example %>% summarise_each(funs(chisq.test(... 

but I'm not sure how to finish it. Also, how would I melt the data frame and use group_by or do() to get the chi-square stats? I'm interested in seeing both methods. If there's a way to incorporate the broom package, that would be great too, or tidyr instead of reshape2.

So to recap, I would like to run chi-square tests, such as

chisq.test(Example$Sat, Example$Gender)

but...I would like to produce chi-square stats for the Sat variable against Gender, Ethnicity, and AgeGroup. This is a small example, and I'm hoping the methods above will allow me to create chi-square stats across many columns in a fast and efficient manner. Bonus if I can plot the residuals in a heat map with ggplot2, which is why I'm interested in incorporating the broom package into this example.

like image 202
Mike Avatar asked Oct 18 '22 12:10

Mike


1 Answers

If we need to get the p values

 Example %>% 
    summarise_each(funs(chisq.test(., 
               Example$Sat)$p.value), -one_of("Sat"))
 #     Gender Ethnicity  AgeGroup
 #1 0.2326237 0.6592406 0.1545873

Or to extract the statistic

Example %>%
    summarise_each(funs(chisq.test(., 
           Example$Sat)$statistic), -one_of("Sat"))
#   Gender Ethnicity AgeGroup
#1 2.916667 0.8333333 6.666667

To get the residuals, it would be easier with base R

 lapply(Example[setdiff(names(Example), "Sat")], 
       function(x) chisq.test(x, Example$Sat)$residuals)
like image 154
akrun Avatar answered Oct 30 '22 04:10

akrun