Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform multiple paired t-tests based on groups/categories

Tags:

r

t-test

I am stuck at performing t.tests for multiple categories in Rstudio. I want to have the results of the t.test of each product type, comparing the online and offline prices. I have over 800 product types so that's why don't want to do it manually for each product group.

I have a dataframe (more than 2 million rows) named data that looks like:

> Product_type   Price_Online   Price_Offline   
1   A            48             37
2   B            29             22
3   B            32             40
4   A            38             36
5   C            32             27
6   C            31             35
7   C            28             24
8   A            47             42
9   C            40             36

Ideally I want R to write the result of the t.test to another data frame called product_types:

    > Product_type   
    1   A           
    2   B            
    3   C          
    4   D          
    5   E         
    6   F            
    7   G            
    8   H            
    9   I            
   800 ...

becomes:

> Product_type   t         df       p-value   interval    mean of difference            
    1   A           
    2   B            
    3   C          
    4   D          
    5   E         
    6   F            
    7   G            
    8   H            
    9   I            
   800 ...

This is the formula if I had all product types in different dataframes:

t.test(Product_A$Price_Online, Product_A$Price_Offline, mu=0, alt="two.sided", paired = TRUE, conf.level = 0.99)

There must be an easier way to do this. Otherwise I need to make 800+ data frames and then perform the t test 800 times.

I tried things with lists & lapply but so far it doesn't work. I also tried t-Test on multiple columns: https://sebastiansauer.github.io/multiple-t-tests-with-dplyr/

However, at the end he is still manually inserting male & female (for me over 800 categories).

like image 470
User100009 Avatar asked Mar 05 '17 14:03

User100009


Video Answer


1 Answers

One way to do it is to use by:

result <- by(data, data$Product_type, function(x) 
  t.test(x$Price_Online, x$Price_Offline, mu=0, alt="two.sided", 
         paired=TRUE, conf.level=0.99)[c(1:9)])

To get your results in a dataframe, you have to rbind it:

type.convert(as.data.frame(do.call(rbind, result)), as.is=TRUE)
#     statistic parameter   p.value             conf.int estimate null.value   stderr alternative        method
# A    2.267787         2 0.1514719  -20.25867, 32.25867        6          0 2.645751   two.sided Paired t-test
# B -0.06666667         1 0.9576214  -477.9256, 476.9256     -0.5          0      7.5   two.sided Paired t-test
# C    1.073154         3 0.3618456 -9.996192, 14.496192     2.25          0 2.096624   two.sided Paired t-test

Or, using pipes:

do.call(rbind, result) |> as.data.frame() |> type.convert(as.is=TRUE)

Data

data <- structure(list(Product_type = c("A", "B", "B", "A", "C", "C", 
"C", "A", "C"), Price_Online = c(48L, 29L, 32L, 38L, 32L, 31L, 
28L, 47L, 40L), Price_Offline = c(37L, 22L, 40L, 36L, 27L, 35L, 
24L, 42L, 36L)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9"))
like image 75
yeedle Avatar answered Oct 16 '22 10:10

yeedle