There must be an R-ly way to call wilcox.test
over multiple observations in parallel using group_by. I've spent a good deal of time reading up on this but still can't figure out a call to wilcox.test
that does the job. Example data and code below, using magrittr
pipes and summarize()
.
library(dplyr)
library(magrittr)
# create a data frame where x is the dependent variable, id1 is a category variable (here with five levels), and id2 is a binary category variable used for the two-sample wilcoxon test
df <- data.frame(x=abs(rnorm(50)),id1=rep(1:5,10), id2=rep(1:2,25))
# make sure piping and grouping are called correctly, with "sum" function as a well-behaving example function
df %>% group_by(id1) %>% summarise(s=sum(x))
df %>% group_by(id1,id2) %>% summarise(s=sum(x))
# make sure wilcox.test is called correctly
wilcox.test(x~id2, data=df, paired=FALSE)$p.value
# yet, cannot call wilcox.test within pipe with summarise (regardless of group_by). Expected output is five p-values (one for each level of id1)
df %>% group_by(id1) %>% summarise(w=wilcox.test(x~id2, data=., paired=FALSE)$p.value)
df %>% summarise(wilcox.test(x~id2, data=., paired=FALSE))
# even specifying formula argument by name doesn't help
df %>% group_by(id1) %>% summarise(w=wilcox.test(formula=x~id2, data=., paired=FALSE)$p.value)
The buggy calls yield this error:
Error in wilcox.test.formula(c(1.09057358373486,
2.28465932554436, 0.885617572657959, : 'formula' missing or incorrect
Thanks for your help; I hope it will be helpful to others with similar questions as well.
Your task will be easily accomplished using the do function (call ?do after loading the dplyr library). Using your data, the chain will look like this:
df <- data.frame(x=abs(rnorm(50)),id1=rep(1:5,10), id2=rep(1:2,25))
df <- tbl_df(df)
res <- df %>% group_by(id1) %>%
do(w = wilcox.test(x~id2, data=., paired=FALSE)) %>%
summarise(id1, Wilcox = w$p.value)
res
Source: local data frame [5 x 2]
id1 Wilcox
(int) (dbl)
1 1 0.6904762
2 2 0.4206349
3 3 1.0000000
4 4 0.6904762
5 5 1.0000000
Note I added the do function between the group_by and summarize.
I hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With