Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up wilcox.test in R

I am currently trying to implement the Wilcoxon Ranksum test on multiple data sets that I've combined into one large matrix, A, that is 705x17635 (ie I want to run the ranksum test 17,635 times. The only way I've seen how to do this without using for loops is lapply, which I've run as:

> lapply(data.frame(A), function(x) 
         wilcox.test(x,b,alternative="greater",exact=FALSE,correct=FALSE))

where b is our negative control data and is a 20000x1 vector. Running this, however, takes very long (I gave up after 30 minutes), and I'm wondering if there's a quicker way to run this, especially since I can do the same process in MATLAB (even with a forloop) in about five minutes, but I need to use R for various reasons.

like image 507
NaiveHalmos Avatar asked Apr 10 '14 20:04

NaiveHalmos


People also ask

How do you calculate effect size Wilcoxon signed-rank test in r?

The effect size r is calculated as Z statistic divided by square root of the sample size (N) (Z/√N). The Z value is extracted from either coin::wilcoxsign_test() (case of one- or paired-samples test) or coin::wilcox_test() (case of independent two-samples test).

What is V in the Wilcox test in r?

The V-statistic is the sum of ranks assigned to the differences with positive signs. Meaning, when you run a Wilcoxon Signed Rank test, it calculates a sum of negative ranks (W-) and a sum of positive ranks (W+).


1 Answers

There are some packages which try to address this issue. i.e.:

A <- matrix(rnorm(705*17635), nrow=705)
b <- rnorm(20000)

library(matrixTests)
res <- col_wilcoxon_twosample(A, b) # running time: 83 seconds

A few lines from the result:

res[1:2,]

  obs.x obs.y obs.tot statistic    pvalue alternative location.null exact corrected
1   705 20000   20705   6985574 0.6795783   two.sided             0 FALSE      TRUE
2   705 20000   20705   7030340 0.8997009   two.sided             0 FALSE      TRUE

Check if result is the same as doing wilcox.test() column by column:

wilcox.test(A[,1], b)

    Wilcoxon rank sum test with continuity correction

data:  A[, 1] and b
W = 6985574, p-value = 0.6796
alternative hypothesis: true location shift is not equal to 0
like image 175
Karolis Koncevičius Avatar answered Oct 04 '22 22:10

Karolis Koncevičius