Is it possible somehow to do a t.test over multiple variables against the same categorical variable without going through a reshaping of the dataset as follows?
data(mtcars)
library(dplyr)
library(tidyr)
j <- mtcars %>% gather(var, val, disp:qsec)
t <- j %>% group_by(var) %>% do(te = t.test(val ~ vs, data = .))
t %>% summarise(p = te$p.value)
I´ve tried using
mtcars %>% summarise_each_(funs = (t.test(. ~ vs))$p.value, vars = disp:qsec)
but it throws an error.
Bonus: How can t %>% summarise(p = te$p.value)
also include the name of the grouping variable?
By using group_by() function from dplyr package we can perform group by on multiple columns or variables (two or more columns) and summarise on multiple columns for aggregations.
One great feature of the group_by function is its ability to group by more than one variable to show what the aggregated data looks like for combinations of the different variables across the response variable.
The function unite() takes multiple columns and paste them together into one.
After all discussions with @aosmith and @Misha, here is one approach. As @aosmith wrote in his/her comments, You want to do the following.
mtcars %>%
summarise_each(funs(t.test(.[vs == 0], .[vs == 1])$p.value), vars = disp:qsec)
# vars1 vars2 vars3 vars4 vars5
#1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06
vs is either 0 or 1 (group). If you want to run a t-test between the two groups in a variable (e.g., dips), it seems that you need to subset data as @aosmith suggested. I would like to say thank you for the contribution.
What I originally suggested works in another situation, in which you simply compare two columns. Here is sample data and codes.
foo <- data.frame(country = "Iceland",
year = 2014,
id = 1:30,
A = sample.int(1e5, 30, replace = TRUE),
B = sample.int(1e5, 30, replace = TRUE),
C = sample.int(1e5, 30, replace = TRUE),
stringsAsFactors = FALSE)
If you want to run t-tests for the A-C, and B-C combination, the following would be one way.
foo2 <- foo %>%
summarise_each(funs(t.test(., C, pair = TRUE)$p.value), vars = A:B)
names(foo2) <- colnames(foo[4:5])
# A B
#1 0.2937979 0.5316822
I like the following solution using the powerful "broom" package:
library("dplyr")
library("broom")
your_db %>%
group_by(grouping_variable1, grouping_variable2 ...) %>%
do(tidy(t.test(variable_u_want_2_test ~ dicothomous_grouping_var, data = .)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With