I am currently trying to implement the Wilcoxon Ranksum test on multiple data sets that I've combined into one large matrix, A
, that is 705x17635
(ie I want to run the ranksum test 17,635
times. The only way I've seen how to do this without using for loops is lapply
, which I've run as:
> lapply(data.frame(A), function(x)
wilcox.test(x,b,alternative="greater",exact=FALSE,correct=FALSE))
where b
is our negative control data and is a 20000x1
vector. Running this, however, takes very long (I gave up after 30 minutes), and I'm wondering if there's a quicker way to run this, especially since I can do the same process in MATLAB (even with a forloop) in about five minutes, but I need to use R for various reasons.
The effect size r is calculated as Z statistic divided by square root of the sample size (N) (Z/√N). The Z value is extracted from either coin::wilcoxsign_test() (case of one- or paired-samples test) or coin::wilcox_test() (case of independent two-samples test).
The V-statistic is the sum of ranks assigned to the differences with positive signs. Meaning, when you run a Wilcoxon Signed Rank test, it calculates a sum of negative ranks (W-) and a sum of positive ranks (W+).
There are some packages which try to address this issue. i.e.:
A <- matrix(rnorm(705*17635), nrow=705)
b <- rnorm(20000)
library(matrixTests)
res <- col_wilcoxon_twosample(A, b) # running time: 83 seconds
A few lines from the result:
res[1:2,]
obs.x obs.y obs.tot statistic pvalue alternative location.null exact corrected
1 705 20000 20705 6985574 0.6795783 two.sided 0 FALSE TRUE
2 705 20000 20705 7030340 0.8997009 two.sided 0 FALSE TRUE
Check if result is the same as doing wilcox.test()
column by column:
wilcox.test(A[,1], b)
Wilcoxon rank sum test with continuity correction
data: A[, 1] and b
W = 6985574, p-value = 0.6796
alternative hypothesis: true location shift is not equal to 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With