python's scipy.stats.ranksums vs. R's wilcox.test

Question

Both python's scipy.stats.ranksums and R's wilcox.test are supposed to calculate two-sided p-values for a Wilcoxon rank sum test. But when I run both functions on the same data, I get p-values that differ by orders of magnitude:

R:

> x=c(57.07168,46.95301,31.86423,38.27486,77.89309,76.78879,33.29809,58.61569,18.26473,62.92256,50.46951,19.14473,22.58552,24.14309)
> y=c(8.319966,2.569211,1.306941,8.450002,1.624244,1.887139,1.376355,2.521150,5.940253,1.458392,3.257468,1.574528,2.338976)
> print(wilcox.test(x, y))

        Wilcoxon rank sum test

data:  x and y 
W = 182, p-value = 9.971e-08
alternative hypothesis: true location shift is not equal to 0

Python:

>>> x=[57.07168,46.95301,31.86423,38.27486,77.89309,76.78879,33.29809,58.61569,18.26473,62.92256,50.46951,19.14473,22.58552,24.14309]
>>> y=[8.319966,2.569211,1.306941,8.450002,1.624244,1.887139,1.376355,2.521150,5.940253,1.458392,3.257468,1.574528,2.338976]
>>> scipy.stats.ranksums(x, y)
(4.415880433163923, 1.0059968254463979e-05)

So R gives me 1e-7 while Python gives me 1e-5.

Where does this difference come from and which one is the 'correct' p-value?

Ben Bolker · Accepted Answer

It depends on the choice of options (exact vs a normal approximation, with or without continuity correction):

R's default:

By default (if ‘exact’ is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.

Default (as shown above):

wilcox.test(x, y)

    Wilcoxon rank sum test

data:  x and y 
W = 182, p-value = 9.971e-08
alternative hypothesis: true location shift is not equal to 0

Normal approximation with continuity correction:

> wilcox.test(x, y, exact=FALSE, correct=TRUE)

    Wilcoxon rank sum test with continuity correction

data:  x and y 
W = 182, p-value = 1.125e-05
alternative hypothesis: true location shift is not equal to 0

Normal approximation without continuity correction:

> (w0 <- wilcox.test(x, y, exact=FALSE, correct=FALSE))

    Wilcoxon rank sum test

data:  x and y 
W = 182, p-value = 1.006e-05
alternative hypothesis: true location shift is not equal to 0

For a little more precision:

w0$p.value
[1] 1.005997e-05

It looks like the other value Python is giving you (4.415880433163923) is the Z-score:

2*pnorm(4.415880433163923,lower.tail=FALSE)
[1] 1.005997e-05

I can appreciate wanting to know what's going on, but I would also point out that there is rarely any practical difference between p=1e-7 and p=1e-5 ...

python's scipy.stats.ranksums vs. R's wilcox.test

Tags:

python

r

scipy

Nils

1 Answers

Ben Bolker

Recent Activity

Donate For Us

python's scipy.stats.ranksums vs. R's wilcox.test

Tags:

python

r

scipy

Nils

1 Answers

Ben Bolker

Related questions

Recent Activity

Donate For Us