I'm trying a significance test using wilcox.test
in R. I want to basically test if a value x
is significantly within/outside a distribution d
.
I'm doing the following:
d = c(90,99,60,80,80,90,90,54,65,100,90,90,90,90,90)
wilcox.test(60,d)
Wilcoxon rank sum test with continuity correction
data: 60 and d
W = 4.5, p-value = 0.5347
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(60, d) : cannot compute exact p-value with ties
and basically the p-value is the same for a big range of numbers i test.
I've tried wilcox_test()
from the coin
package, but i can't get it to work testing a value against a distribution.
Is there an alternative to this test that does the same and knows how to deal with ties?
How worried are you about the non-exact results? I would guess that the approximation is reasonable for a data set this size. (I did manage to get coin::wilcox_test
working, and the results are not hugely different ...)
d <- c(90,99,60,80,80,90,90,54,65,100,90,90,90,90,90)
pfun <- function(x) {
suppressWarnings(w <- wilcox.test(x,d)$p.value)
return(w)
}
testvec <- 30:120
p1 <- sapply(testvec,pfun)
library("coin")
pfun2 <- function(x) {
dd <- data.frame(y=c(x,d),f=factor(c(1,rep(2,length(d)))))
return(pvalue(wilcox_test(y~f,data=dd)))
}
p2 <- sapply(testvec,pfun2)
library("exactRankTests")
pfun3 <- function(x) {wilcox.exact(x,d)$p.value}
p3 <- sapply(testvec,pfun3)
Picture:
par(las=1,bty="l")
matplot(testvec,cbind(p1,p2,p3),type="s",
xlab="value",ylab="p value of wilcoxon test",lty=1,
ylim=c(0,1),col=c(1,2,4))
legend("topright",c("stats::wilcox.test","coin::wilcox_test",
"exactRankTests::wilcox.exact"),
lty=1,col=c(1,2,4))
(exactRankTests
added by request, but given that it's not maintained any more and recommends the coin
package, I'm not sure how reliable it is. You're on your own for figuring out what the differences among these procedures are and which would be best to use ...)
The results make sense here -- the problem is just that your power is low. If your value is completely outside the range of the data, for n=15, that will be a probability of something like 2*(1/16)=0.125 [i.e. probability of your sample ending up as the first or the last element in a permutation], which is not quite the same as the minimum value here (wilcox.test
: p=0.105, wilcox_test
: p=0.08), but that might be an approximation issue, or I might have some detail wrong. Nevertheless, it's in the right ballpark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With