Assume a data structure like this:
ID testA_wave1 testA_wave2 testA_wave3 testB_wave1 testB_wave2 testB_wave3
1 1 3 2 3 6 5 3
2 2 4 4 4 3 6 6
3 3 10 2 1 4 4 4
4 4 5 3 12 2 7 4
5 5 5 3 9 2 4 2
6 6 10 0 2 6 6 5
7 7 6 8 4 6 8 3
8 8 1 5 4 5 6 0
9 9 3 2 7 8 4 4
10 10 4 9 5 11 8 8
What I want to achieve is to calculate a paired t-test for every test separately (in this case meaning testA and testB, but in real-life I have much more tests). I want to do it that way that I compare the first wave of a given test with every other subsequent wave of the same test (meaning testA_wave1 vs testA_wave2 and testA_wave1 vs testA_wave3 in the case of testA).
This way, I was able to achieve it:
df %>%
gather(variable, value, -ID) %>%
mutate(wave_ID = paste0("wave", parse_number(variable)),
variable = ifelse(grepl("testA", variable), "testA",
ifelse(grepl("testB", variable), "testB", NA_character_))) %>%
group_by(wave_ID, variable) %>%
summarise(value = list(value)) %>%
spread(wave_ID, value) %>%
group_by(variable) %>%
mutate(p_value_w1w2 = t.test(unlist(wave1), unlist(wave2), paired = TRUE)$p.value,
p_value_w1w3 = t.test(unlist(wave1), unlist(wave3), paired = TRUE)$p.value) %>%
select(variable, matches("(p_value)"))
variable p_value_w1w2 p_value_w1w3
<chr> <dbl> <dbl>
1 testA 0.664 0.921
2 testB 0.146 0.418
However, I would like to see different/more elegant solutions that give similar results. I'm looking mostly for dplyr
/tidyverse
solutions, but if there is a completely different way to achieve it, I'm not against it.
Sample data:
set.seed(123)
df <- data.frame(ID = 1:20,
testA_wave1 = round(rnorm(20, 5, 3), 0),
testA_wave2 = round(rnorm(20, 5, 3), 0),
testA_wave3 = round(rnorm(20, 5, 3), 0),
testB_wave1 = round(rnorm(20, 5, 3), 0),
testB_wave2 = round(rnorm(20, 5, 3), 0),
testB_wave3 = round(rnorm(20, 5, 3), 0))
If you want all the variables compared individually you could do paired tests, yes, or you could equivalently treat them as repeated measures ANOVA.
dplyr is a package for making tabular data wrangling easier by using a limited set of functions that can be combined to extract and summarize insights from your data. It pairs nicely with tidyr which enables you to swiftly convert between different data formats (long vs. wide) for plotting and analysis.
for comparing three means you can use Both ANOVA and t test. t test is mainly used to compare two group means. for comparing more than two group means ANOVA is used.
A t-test is an inferential statistic used to determine if there is a statistically significant difference between the means of two variables.
Update 03/16/2022
The tidyverse has evolved and so should this solution.
First I make a simplifying assumption: If we designed the experiment, then we know what the groups are and how many waves we followed them through. If we don't know, then we can extract this information from the column names. See at below.
library("broom")
library("tidyverse")
tests <- c("A", "B")
waves <- 3
comparisons <-
list(
test = tests,
first = 1,
later = seq(2, waves)
) %>%
cross_df()
comparisons
#> # A tibble: 4 × 3
#> test first later
#> <chr> <dbl> <int>
#> 1 A 1 2
#> 2 B 1 2
#> 3 A 1 3
#> 4 B 1 3
Transform the data from wide format to long format.
data <- df %>%
pivot_longer(
-ID,
names_to = "test_wave"
) %>%
extract(
test_wave, c("test", "wave"),
regex = "test(.+)_wave(.+)",
convert = TRUE
)
Then pair the comparisons we want to make with the data we collected. I've added lots of rename statements to make for more readable code but it's not strictly necessary.
comparisons %>%
inner_join(
data,
by = c("test", "first" = "wave")
) %>%
rename(
value.first = value
) %>%
inner_join(
data,
by = c("test", "later" = "wave", "ID")
) %>%
rename(
value.later = value
) %>%
group_by(
test, first, later
) %>%
group_modify(
~ tidy(t.test(.x$value.first, .x$value.later, paired = TRUE))
) %>%
ungroup() %>%
pivot_wider(
id_cols = test,
names_from = later,
names_glue = "wave1_vs_wave{later}",
values_from = p.value
)
#> # A tibble: 2 × 3
#> test wave1_vs_wave2 wave1_vs_wave3
#> <chr> <dbl> <dbl>
#> 1 A 0.664 0.921
#> 2 B 0.146 0.418
Appendix: Extract test names and number of waves from column names.
design <- df %>%
select(starts_with("test")) %>%
colnames() %>%
str_match("test(.+)_wave(.+)")
tests <- unique(design[, 2])
waves <- max(as.integer(design[, 3]))
Created on 2022-03-16 by the reprex package (v2.0.1)
Old solution
Here is one way to do it, using purrr
quite a bit.
library("tidyverse")
set.seed(123)
df <- tibble(
ID = 1:20,
testA_wave1 = round(rnorm(20, 5, 3), 0),
testA_wave2 = round(rnorm(20, 5, 3), 0),
testA_wave3 = round(rnorm(20, 5, 3), 0),
testB_wave1 = round(rnorm(20, 5, 3), 0),
testB_wave2 = round(rnorm(20, 5, 3), 0),
testB_wave3 = round(rnorm(20, 5, 3), 0)
)
pvalues <- df %>%
# From wide tibble to long tibble
gather(test, value, -ID) %>%
separate(test, c("test", "wave")) %>%
# Not stricly necessary; will order the waves alphabetically instead
mutate(wave = parse_number(wave)) %>%
inner_join(., ., by = c("ID", "test")) %>%
# If there are two waves w1 and w2,
# we end up with pairs (w1, w1), (w1, w2), (w2, w1) and (w2, w2),
# so filter out to keep the pairing (w1, w2) only
filter(wave.x == 1, wave.x < wave.y) %>%
nest(ID, value.x, value.y) %>%
mutate(pvalue = data %>%
# Perform the test
map(~t.test(.$value.x, .$value.y, paired = TRUE)) %>%
map(broom::tidy) %>%
# Also not strictly necessary; you might want to keep all
# information about the test: estimate, statistic, etc.
map_dbl(pluck, "p.value"))
pvalues
#> # A tibble: 4 x 5
#> test wave.x wave.y data pvalue
#> <chr> <dbl> <dbl> <list> <dbl>
#> 1 testA 1 2 <tibble [20 x 3]> 0.664
#> 2 testA 1 3 <tibble [20 x 3]> 0.921
#> 3 testB 1 2 <tibble [20 x 3]> 0.146
#> 4 testB 1 3 <tibble [20 x 3]> 0.418
pvalues %>%
# Drop the data in order to pivot the table
select(- data) %>%
unite("waves", wave.x, wave.y, sep = ":") %>%
spread(waves, pvalue)
#> # A tibble: 2 x 3
#> test `1:2` `1:3`
#> <chr> <dbl> <dbl>
#> 1 testA 0.664 0.921
#> 2 testB 0.146 0.418
Created on 2019-03-08 by the reprex package (v0.2.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With