Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fisher test error : LDSTP is too small

input

NN <- c(359,32);JJ <- c(108,13);NNS <- c(103,15);VBN <- c(95,9);RB <- c(63,11);NNP <- c(56,0);VBG <- c(55,10);IN <- c(38,16);VB <- c(20,10);CD <- c(17,6);CC <- c(11,6);DT <- c(11,4);MD <- c(8,5);PRP4 <- c(8,1);PRP <- c(7,4);FW <- c(5,1);VBD <- c(5,3);RBR <- c(4,0);VBP <- c(4,1);VBZ <- c(4,3);WRB <- c(4,2);EX <- c(3,1);NNPS <- c(2,0);WDT <- c(2,3);WP <- c(2,1);PDT <- c(1,1);POS <- c(1,0);RBS <- c(1,0);TO <- c(1,1);UH <- c(0,1)
Finaltable <-
cbind(NN,JJ,NNS,VBN,RB,NNP,VBG,IN,VB,CD,CC,DT,MD,PRP4,PRP,FW,VBD,RBR,VBP,VBZ,WRB,EX,NNPS,WDT,WP,PDT,POS,RBS,TO,UH)
rownames(Finaltable) <- c("tag1","tag2")
Finaltable

chisq.test(Finaltable)


fisher.test(Finaltable)

output

fisher.test(Finaltable) : FEXACT error 7.
LDSTP is too small for this problem.
Try increasing the size of the workspace.

How can I solve this problem without modifying the raw data? Is there any non-parametric test for this comparison?

like image 640
Choijaeyoung Avatar asked Jun 11 '13 19:06

Choijaeyoung


People also ask

Does sample size affect Fisher's exact test?

Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes.

What does p-value 1 mean in Fisher's exact test?

The p-value=1 means that you failed to reject the H0, i.e. there is no positive association between your variables. I don't think that you need to correct it. Fisher's exact test is recommended for small sample sizes, and your sample size is relatively large.

What is the test statistic for Fisher's exact test?

For a 2 by 2 contingency table, some said Fisher's exact test uses the count X1,1 in the (1,1) cell in the table as the test statistic, and under null hypothesis, X1,1 will have a hypergeometric distribution. where μ is the mean of the hypergeometric distribution (mentioned above) under null.

What are the assumptions of the Fisher exact test?

Assumptions. The row and column totals are fixed, not random. Sampling or allocation are random and observations are mutually independent within the constraints of fixed marginal totals. Each observation is mutually exclusive - in other words each observation can only be classified in one cell.


1 Answers

You can try increasing the workspace argument from its default value, but I don't know if you're going to be able to make it big enough (I gave up at workspace=2e8, which still fails; I ran out of memory at workspace=2e9.) You can also try simulated p-values, e.g. fisher.test(Finaltable,simulate.p.value=TRUE,B=1e7) (for example), but since the p-value is extremely small, you're going to need a huge number of simulations (B) if you want to do more than bound the p-value, which will also be very slow. (For most purposes, knowing that p is <1e-7 is more than enough -- but in some bioinformatics contexts people want to use p as an index of signal strength and/or impose massive multiple-corrections comparisons. I don't really like these approaches, but they're out there ...)

like image 199
Ben Bolker Avatar answered Sep 28 '22 03:09

Ben Bolker