Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: t test over multiple columns using t.test function

Tags:

loops

r

apply

I tried to perform independent t-test for many columns of a dataframe. For example, i created a data frame

set seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)

To run the test, i used with(df, t.test(y ~ group))

with(test_data, t.test(a ~ grp))
with(test_data, t.test(b ~ grp))
with(test_data, t.test(c ~ grp))

I would like to have the outputs like this

mean in group m mean in group y  p-value
9.747412        9.878820         0.6944
15.12936        16.49533         0.07798 
20.39531        20.20168         0.9027

I wonder how can I achieve the results using 1. for loop 2. apply() 3. perhaps dplyr

This link R: t-test over all columns is related but it was 6 years old. Perhaps there are better ways to do the same thing.

like image 355
KIM Avatar asked Feb 21 '18 14:02

KIM


4 Answers

Use select_if to select only numeric columns then use purrr:map_df to apply t.test against grp. Finally use broom:tidy to get the results in tidy format

library(tidyverse)

res <- test_data %>% 
  select_if(is.numeric) %>%
  map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
#> # A tibble: 3 x 11
#>   var   estimate estimate1 estimate2 statistic p.value parameter conf.low
#>   <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
#> 1 a       -0.259      9.78      10.0    -0.587   0.565      16.2    -1.19
#> 2 b        0.154     15.0       14.8     0.169   0.868      15.4    -1.78
#> 3 c       -0.359     20.4       20.7    -0.287   0.778      16.5    -3.00
#> # ... with 3 more variables: conf.high <dbl>, method <chr>,
#> #   alternative <chr>

Created on 2019-03-15 by the reprex package (v0.2.1.9000)

like image 60
Tung Avatar answered Nov 14 '22 23:11

Tung


Simply extract the estimate and p-value results from t.test call while iterating through all needed columns with sapply. Build formulas from a character vector and transpose with t() for output:

formulas <- paste(names(test_data)[1:(ncol(test_data)-1)], "~ grp")

output <- t(sapply(formulas, function(f) {      
  res <- t.test(as.formula(f))
  c(res$estimate, p.value=res$p.value)      
}))

Input data (seeded for reproducibility)

set.seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)

Output result

#         mean in group m mean in group y   p.value
# a ~ grp        9.775477        10.03419 0.5654353
# b ~ grp       14.972888        14.81895 0.8678149
# c ~ grp       20.383679        20.74238 0.7776188
like image 20
Parfait Avatar answered Nov 14 '22 23:11

Parfait


As you asked for a for loop:

  a <- rnorm(20, 10, 1)
  b <- rnorm(20, 15, 2)
  c <- rnorm(20, 20, 3)
  grp <- rep(c('m', 'y'),10)
  test_data <- data.frame(a, b, c, grp)  

  meanM=NULL
  meanY=NULL
  p.value=NULL

  for (i in 1:(ncol(test_data)-1)){
    meanM=as.data.frame(rbind(meanM, t.test(test_data[,i] ~ grp)$estimate[1]))
    meanY=as.data.frame(rbind(meanY, t.test(test_data[,i] ~ grp)$estimate[2]))
    p.value=as.data.frame(rbind(p.value, t.test(test_data[,i] ~ grp)$p.value))
   }

  cbind(meanM, meanY, p.value)

It works, but I am a beginner in R. So maybe there is a more efficient solution

like image 27
DaWassi Avatar answered Nov 14 '22 23:11

DaWassi


Using lapply this is rather easy.
I have tested the code with set.seed(7060) before creating the dataset, in order to make the results reproducible.

tests_list <- lapply(letters[1:3], function(x) t.test(as.formula(paste0(x, "~ grp")), data = test_data))

result <- do.call(rbind, lapply(tests_list, `[[`, "estimate"))
pval <- sapply(tests_list, `[[`, "p.value")
result <- cbind(result, p.value = pval)

result
#     mean in group m mean in group y   p.value
#[1,]        9.909818        9.658813 0.6167742
#[2,]       14.578926       14.168816 0.6462151
#[3,]       20.682587       19.299133 0.2735725

Note that a real life application would use names(test_data)[1:3], not letters[1:3], in the first lapply instruction.

like image 30
Rui Barradas Avatar answered Nov 14 '22 23:11

Rui Barradas