Chi-squared test of independence on all combinations of columns in a dataframe in R

Tags:

this is my first time posting here and I hope this is all in the right place. I have been using R for basic statistical analysis for some time, but haven't really used it for anything computationally challenging and I'm very much a beginner in the programming/ data manipulation side of R.

I have presence/absence (binary) data on 72 plant species in 323 plots in a single catchment. The dataframe is 323 rows, each representing a plot, with 72 columns, each representing a species. This is a sample of the first 4 columns (some row numbers are missing because the 323 plots are a subset of a larger number of preassigned plots, not all of which were surveyed):

> head(plots[,1:4])
 Agrostis.canina Agrostis.capillaris Alchemilla.alpina Anthoxanthum.odoratum
1               1                   0                 0                     0
3               0                   0                 0                     0
4               0                   0                 0                     0
5               0                   0                 0                     0
6               0                   0                 0                     0
8               0                   0                 0                     0

I want to to determine whether any of the plant species in this catchment are associated with any others, and if so, whether that is a positive or negative association. To do this I want to perform a chi-squared test of independence on each combination of species. I need to create a 2x2 contingency table for each speciesxspecies comparison, run a chi-squared test on each of those contingency tables, and save the output. Ultimately I would like to end up with a list or matrix of all species by species tests that shows whether that combination of species has a positive, negative, or no significant association. I'd also like to incorporate some code that only shows an association as positive if all expected values were greater than 5.

I have made a start by writing the following function:

CHI <- function(sppx, sppy) 
{test <- chisq.test(table(sppx, sppy)) 
result <- c(test$statistic, test$p.value,
        sign((table(sppx, sppy) - test$expected)[2,2]))
return(result)
}

This returns the following:

> CHI(plots$Agrostis.canina, plots$Agrostis.capillaris)

X-squared                             
1.095869e-27  1.000000e+00 -1.000000e+00 
Warning message:
In chisq.test(chitbl) : Chi-squared approximation may be incorrect

Now I'm trying to figure out a way to apply this function to each speciesxspecies combination in the data frame. I essentially want R to take each column, apply the CHI function to that column and each other column in sequence, and so on through all the columns, subtracting each column from the dataframe as it is done so the same species pair is not tested twice. I have tried various methods trying to use "for" loops or "apply" functions, but have not been able to figure this out. I hope that is clear enough. Any help here would be much appreciated. I have tried looking for existing solutions to this specific problem online, but haven't been able to find any that really helped. If anyone could link me to an existing answer to this that would also be great.

209

asked May 23 '16 13:05

YJS

2 Answers

You need the combn function to find all the combinations of the columns and then apply them to your function, something like this:

apply(combn(1:ncol(plots), 2), 2, function(ind) CHI(plots[, ind[1]], plots[, ind[2]]))

answered Oct 09 '22 07:10

Psidom

I think you are looking for something like this. I used the iris dataset.

require(datasets)
ind<-combn(NCOL(iris),2)
lapply(1:NCOL(ind), function (i) CHI(iris[,ind[1,i]],iris[,ind[2,i]]))

answered Oct 09 '22 06:10

Jeonifer

Related questions
                            
                                Sum rows based on ID
                            
                                Abstract types in R
                            
                                Logistic regression + histogram with ggplot2
                            
                                Rewriting Mixed effects model formula from R (lme4) to Julia
                            
                                How to scrape all subreddit posts in a given time period
                            
                                Merging two data frames, both with coordinates based on the closest location
                            
                                R googleVis BubbleChart, set size without setting colours
                            
                                R Markdown: Citation parsing
                            
                                Transform the coefficient and confidence intervals in texreg output
                            
                                Relassify continuous raster data into binned classes with discrete colors
                            
                                Why caret preProcess impute method scales data automatically
                            
                                rhandsontable change background of specific row
                            
                                Finding subvector of maximum length containing a small proportion of 0's
                            
                                Change loadings (arrows) length in PCA plot using ggplot2/ggfortify?
                            
                                R: ggplot: Error: Unknown parameters: binwidth, bins, pad
                            
                                R: Calculate sill, range and nugget from a raster object
                            
                                Show "loading graph" message in plotly
                            
                                while TRUE + break in a sub-environment
                            
                                R dplyr - distinct accross all columns
                            
                                What does seed do in random forest?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Chi-squared test of independence on all combinations of columns in a dataframe in R

Tags:

r

chi-squared

YJS

People also ask

2 Answers

Psidom

Jeonifer

Recent Activity

Donate For Us