Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

coin::wilcox_test versus wilcox.test in R

Tags:

r

In trying to figure out which one is better to use I have come across two issues.

1) The W statistic given by wilcox.test is different from that of coin::wilcox_test. Here's my output:

wilcox_test:

Exact Wilcoxon Mann-Whitney Rank Sum Test

data:  data$variableX by data$group (yes, no) 
Z = -0.7636, p-value = 0.4489
alternative hypothesis: true mu is not equal to 0 

wilcox.test:

Wilcoxon rank sum test with continuity correction

data:  data$variable by data$group
W = 677.5, p-value = 0.448
alternative hypothesis: true location shift is not equal to 0 

I'm aware that there's actually two values for W and that the smaller one is usually reported. When wilcox.test is used with comma instead of "~" I can get the other value, but this comes up as W = 834.5. From what I understand, coin::statistic() can return three different statistics using ("linear", "standarized", and "test") where "linear" is the normal W and "standardized" is just the W converted to a z-score. None of these match up to the W I get from wilcox.test though (linear = 1055.5, standardized = 0.7636288, test = -0.7636288). Any ideas what's going on?

2) I like the options in wilcox_test for "distribution" and "ties.method", but it seems that you can not apply a continuity correction like in wilcox.test. Am I right?

like image 495
A.S. Avatar asked May 03 '14 21:05

A.S.


2 Answers

I encountered the same issue when trying to apply Wendt formula to compute effect sizes using the coin package, and obtained aberrant r values due to the fact that the linear statistic outputted by wilcox_test() is unadjusted.

A great explanation is already given here, and therefore I will simply address how to obtain adjusted U statistics with the wilcox_test() function. Let's use a the following data frame:

d <- data.frame( x = c(rnorm(n = 60, mean = 10, sd = 5), rnorm(n = 30, mean = 16, sd = 5)), 
                 g = c(rep("a",times = 60), rep("b",times = 30)) )

We can perform identical tests with wilcox.test() and wilcox_test():

 w1 <- wilcox.test( formula = x ~ g, data = d ) 
 w2 <- wilcox_test( formula = x ~ g, data = d )

Which will output two distinct statistics:

> w1$statistic
   W 
 321 

> w2@statistic@linearstatistic
[1] 2151

The values are indeed totally different (albeit the tests are equivalent).

To obtain the U statistics identical to that of wilcox.test(), you need to subtract wilcox_test()'s output statistic by the minimal value that the sum of the ranks of the reference sample can take, which is n_1(n_1+1)/2.

Both commands take the first level in the factor of your grouping variable g as reference (which will by default be alphabetically ordered).

Then you can compute the smallest sum of the ranks possible for the reference sample:

n1  <- table(w2@statistic@x)[1]

And

w2@statistic@linearstatistic-  n1*(n1+1)/2 == w1$statistic

should return TRUE

Voilà.

like image 170
G Chalancon Avatar answered Sep 20 '22 13:09

G Chalancon


It seems to be one is performing Mann-Whitney's U and the other Wilcoxon rank test, which is defined in many different ways in literature. They are pretty much equivalent, just look at the p-value. If you want continuity correction in wilcox.test just use argument correct=T.

Check https://stats.stackexchange.com/questions/79843/is-the-w-statistic-outputted-by-wilcox-test-in-r-the-same-as-the-u-statistic

like image 25
José Jiménez Avatar answered Sep 21 '22 13:09

José Jiménez