Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chi Square Analysis using for loop in R

Tags:

r

chi-squared

I'm trying to do chi square analysis for all combinations of variables in the data and my code is:

Data <- esoph[ , 1:3]
OldStatistic <- NA
for(i in 1:(ncol(Data)-1)){
for(j in (i+1):ncol(Data)){
Statistic <- data.frame("Row"=colnames(Data)[i], "Column"=colnames(Data)[j],
                     "Chi.Square"=round(chisq.test(Data[ ,i], Data[ ,j])$statistic, 3),
                     "df"=chisq.test(Data[ ,i], Data[ ,j])$parameter,
                     "p.value"=round(chisq.test(Data[ ,i], Data[ ,j])$p.value, 3),
                      row.names=NULL)
temp <- rbind(OldStatistic, Statistic)
OldStatistic <- Statistic
Statistic <- temp
}
}

str(Data)
'data.frame':   88 obs. of  3 variables:
 $ agegp: Ord.factor w/ 6 levels "25-34"<"35-44"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ alcgp: Ord.factor w/ 4 levels "0-39g/day"<"40-79"<..: 1 1 1 1 2 2 2 2 3 3 ...
 $ tobgp: Ord.factor w/ 4 levels "0-9g/day"<"10-19"<..: 1 2 3 4 1 2 3 4 1 2 ...


Statistic
    Row Column Chi.Square df p.value
1 agegp  tobgp      2.400 15       1
2 alcgp  tobgp      0.619  9       1

My code gives my the chi square analysis output for variable 1 vs variable 3, and variable 2 vs variable 3 and is missing for variable 1 vs variable 2. I tried hard but could not fixed the code. Any comment and suggestion will be highly appreciated. I'd like like to do cross tabulation for all possible combinations. Thanks in advance.

EDIT

I used to do this kind of analysis in SPSS but now I want to switch to R.

like image 981
MYaseen208 Avatar asked Sep 11 '11 23:09

MYaseen208


People also ask

Can you do chi-square in R?

In R, the function used for performing a chi-square test is chisq. test() . Parameters: data: data is a table containing count values of the variables in the table.

Can you use chi-square for repeated measures?

A Chi-square test works for arbitrary 2-dimensional tables, it is not restricted to the 2x2 case (binary variables). However, you also have repeated measures, since some participants could vote more than once.

How do you make a chi square distribution in R?

To create a density plot for a Chi-square distribution in R, we can use the following functions: dchisq() to create the probability density function. curve() to plot the probability density function.


2 Answers

A sample of your data would be appreciated, but I think this will work for you. First, create a combination of all columns with combn. Then write a function to use with an apply function to iterate through the combos. I like to use plyr since it is easy to specify what you want for a data structure on the back end. Also note you only need to compute the chi square test once for each combination of columns, which should speed things up quite a bit as well.

library(plyr)

combos <- combn(ncol(Dat),2)

adply(combos, 2, function(x) {
  test <- chisq.test(Dat[, x[1]], Dat[, x[2]])

  out <- data.frame("Row" = colnames(Dat)[x[1]]
                    , "Column" = colnames(Dat[x[2]])
                    , "Chi.Square" = round(test$statistic,3)
                    ,  "df"= test$parameter
                    ,  "p.value" = round(test$p.value, 3)
                    )
  return(out)

})  
like image 166
Chase Avatar answered Oct 11 '22 15:10

Chase


I wrote my own function. It creates a matrix where all nominal variables are tested against each other. It can also save the results as excel file. It displays all the pvalues that are smaller than 5%.

funMassChi <- function (x,delFirst=0,xlsxpath=FALSE) {
  options(scipen = 999)

  start <- (delFirst+1)
  ds <- x[,start:ncol(x)]

  cATeND <- ncol(ds)
  catID  <- 1:cATeND

  resMat <- ds[1:cATeND,1:(cATeND-1)]
  resMat[,] <- NA

    for(nCc in 1:(length(catID)-1)){
      for(nDc in (nCc+1):length(catID)){
        tryCatch({
          chiRes <- chisq.test(ds[,catID[nCc]],ds[,catID[nDc]])
          resMat[nDc,nCc]<- chiRes[[3]]
        }, error=function(e){cat(paste("ERROR :","at",nCc,nDc, sep=" "),conditionMessage(e), "\n")})
      }
    }
  resMat[resMat > 0.05] <- "" 
  Ergebnis <- cbind(CatNames=names(ds),resMat)
  Ergebnis <<- Ergebnis[-1,] 

  if (!(xlsxpath==FALSE)) {
     write.xlsx(x = Ergebnis, file = paste(xlsxpath,"ALLChi-",Sys.Date(),".xlsx",sep=""),
             sheetName = "Tabelle1", row.names = FALSE)
  }
}

funMassChi(categorialDATA,delFirst=3,xlsxpath="C:/folder1/folder2/")

delFirst can delete the first n columns. So if you have an count index or something you dont want to test.

I hope this can help anyone else.

like image 40
Andre Elrico Avatar answered Oct 11 '22 13:10

Andre Elrico