I have the following data frame:
library(tidyverse)
dat <- structure(list(charge.Group3 = c(0.167, 0.167, 0.1, 0.067, 0.033,
0.033, 0.067, 0.133, 0.2, 0.067, 0.133, 0.114, 0.167, 0.033,
0.1, 0.033, 0.133, 0.267, 0.133, 0.233, 0.1, 0.167, 0.067, 0.133,
0.1, 0.133, 0.1, 0.133, 0.1, 0.067, 0.167, 0), hydrophobicity.Group3 = c(0.267,
0.467, 0.067, 0.167, 0.267, 0.1, 0.367, 0.233, 0.367, 0.233,
0.133, 0.205, 0.333, 0.267, 0.267, 0.067, 0.133, 0.3, 0.233,
0.267, 0.5, 0.333, 0.2, 0.5, 0.5, 0.4, 0.033, 0.3, 0.233, 0.5,
0.233, 0.033), class = c("Negative", "Negative", "Positive",
"Positive", "Positive", "Positive", "Positive", "Negative", "Positive",
"Positive", "Positive", "Positive", "Positive", "Positive", "Negative",
"Positive", "Negative", "Negative", "Negative", "Negative", "Negative",
"Negative", "Negative", "Negative", "Negative", "Negative", "Positive",
"Positive", "Positive", "Negative", "Positive", "Negative")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -32L))
dat
#> # A tibble: 32 x 3
#> charge.Group3 hydrophobicity.Group3 class
#> <dbl> <dbl> <chr>
#> 1 0.167 0.267 Negative
#> 2 0.167 0.467 Negative
#> 3 0.1 0.067 Positive
#> 4 0.067 0.167 Positive
#> 5 0.033 0.267 Positive
#> 6 0.033 0.1 Positive
#> 7 0.067 0.367 Positive
#> 8 0.133 0.233 Negative
#> 9 0.2 0.367 Positive
#> 10 0.067 0.233 Positive
#> # ... with 22 more rows
What I want to do for each features: charge.Group3
and hydrophobicity.Group3
, perform wilcox.test
between Negative and positive class. And finally get the p-value as data frame or tibble:
features pvalue
charge.Group3 0.1088
hydrophobicity.Group3 0.03895
# I do by hand.
Note that there are actually more than 2 features. How can I achieve that?
You don't really need to use broom
if you need only the p-value of the tests.
library(tidyverse)
dat %>%
gather(group, value, -class) %>% # reshape data
nest(-group) %>% # for each group nest data
mutate(pval = map_dbl(data, ~wilcox.test(value ~ class, data = .)$p.value)) %>% # get p value for wilcoxon test
select(-data) # remove data column
# # A tibble: 2 x 2
# group pval
# <chr> <dbl>
# 1 charge.Group3 0.109
# 2 hydrophobicity.Group3 0.0390
Reshaping first will enable you to apply this process no matter how many columns you have, assuming that class
is the only extra variable.
Or you can even avoid map
as @Moody_Mudskipper suggested using
dat %>%
gather(group, value, -class) %>%
group_by(group) %>%
summarize(results = wilcox.test(value ~ class)$p.value)
If you really want to involve broom
then you can do
library(broom)
dat %>%
gather(group, value, -class) %>%
nest(-group) %>%
mutate(results = map(data, ~tidy(wilcox.test(value ~ class, data = .)))) %>%
select(-data) %>%
unnest(results)
# # A tibble: 2 x 5
# group statistic p.value method alternative
# <chr> <dbl> <dbl> <chr> <chr>
# 1 charge.Group3 170. 0.109 Wilcoxon rank sum test with continuity correction two.sided
# 2 hydrophobicity.Group3 183 0.0390 Wilcoxon rank sum test with continuity correction two.sided
which returns more columns, but you can keep the p-value if you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With