Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: t.test and pairwise.t.test give different results?

Tags:

r

I tried to do a t-test with R over the following dataframe.

df <- structure(list(freq = c(9, 11, 14, 12, 10, 9, 16, 10, 11, 15, 
13, 12, 12, 13, 13, 9, 16, 14, 12, 15, 16, 10, 11, 13, 14, 14, 
14, 16, 8, 10, 14, 14, 11, 11, 11, 11, 13, 7, 12, 13, 14, 11, 
11, 13, 10, 14, 10, 10, 12, 8, 9, 12, 14, 11, 12, 12, 14, 14, 
14, 15, 12, 13, 14, 8, 9, 11, 10, 14, 12, 12, 9, 10, 8, 14, 11, 
14, 9, 13, 13, 13, 10, 9, 13, 10, 13, 10, 13, 12, 11, 12, 10, 
12, 8, 11, 12, 15, 12, 12, 11, 13, 12, 10, 13, 9, 11, 9, 11, 
8, 12, 12, 12, 10, 11, 12, 9, 13, 14, 11, 11, 14, 13, 12, 14, 
15, 12, 12, 12, 14), class = structure(c(3L, 3L, 2L, 2L, 2L, 
2L, 2L, 3L, 2L, 3L, 4L, 4L, 4L, 4L, 3L, 2L, 3L, 2L, 1L, 4L, 1L, 
4L, 1L, 4L, 2L, 2L, 3L, 3L, 2L, 4L, 1L, 4L, 4L, 4L, 3L, 3L, 3L, 
2L, 1L, 4L, 3L, 3L, 1L, 4L, 1L, 2L, 2L, 3L, 3L, 4L, 2L, 2L, 3L, 
3L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 4L, 1L, 1L, 1L, 2L, 2L, 3L, 
2L, 3L, 2L, 3L, 3L, 4L, 2L, 1L, 4L, 1L, 1L, 3L, 2L, 2L, 2L, 3L, 
1L, 1L, 1L, 1L, 3L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 3L, 3L, 4L, 
4L, 3L, 4L, 4L, 4L, 4L, 3L, 3L, 1L, 4L, 4L, 1L, 4L, 4L, 1L, 3L, 
1L, 2L, 2L, 1L, 2L, 1L, 1L, 3L, 3L, 2L, 1L), .Label = c("ending", 
"mobile", "stem.first", "stem.second"), class = "factor")), .Names = c("freq", 
"class"), row.names = c(NA, -128L), class = "data.frame")

As I read in a previous post there is more than one way to do this in R. I tried both with using the t.test-function and with using the pairwise.t.test-function.

For using t.test I subsetted the dataframe by the classes to be compared and ran subsequent t-tests over the subsets.

ending.vs.mobile <- df[df$class=="ending"|df$class=="mobile",]
ending.vs.first <- df[df$class=="ending"|df$class=="stem.first",]
ending.vs.second <- df[df$class=="ending"|df$class=="stem.second",]
mobile.vs.first <- df[df$class=="mobile"|df$class=="stem.first",]
mobile.vs.second <- df[df$class=="mobile"|df$class=="stem.second",]
first.vs.second <- df[df$class=="stem.first"|df$class=="stem.second",]

t.test(ending.vs.mobile$freq ~ ending.vs.mobile$class, var.equal=T) 
t.test(ending.vs.first$freq ~ ending.vs.first$class, var.equal=T) 
t.test(ending.vs.second$freq ~ ending.vs.second$class, var.equal=T) 
t.test(mobile.vs.first$freq ~ mobile.vs.first$class, var.equal=T) 
t.test(mobile.vs.second$freq ~ mobile.vs.second$class, var.equal=T) 
t.test(first.vs.second$freq ~ first.vs.second$class, var.equal=T)

As far as I have understood it (here I might be wrong) the pairwise.t.test would be more convenient here, as I don't need to create all the subsets and can run it over the original dataframe.

pairwise.t.test(df$freq, df$class, p.adjust.method="none", paired=FALSE, pooled.sd=FALSE)

However I get different results here, most pronounced for the comparison ending vs. stem.second: p=0.7 using t.test and p=0.1 using pairwise.t.test.

What's wrong here? Where have I done sth. wrong?


Although the problem itself is solved, I think the reason why it occurred, makes me a little paranoid (not trusting myself anymore): Just by typing pooled.sd instead of pool.sd I do not get the results I expect. Isn't this very prone to errors?

In many other cases you can type variants, e.g. bonf or bonferroni, fa() or factor(), and so on. But here pooled.sd is completely ignored although "pooled sd" is actually intended. Ok, if you thoroughly read the headline of the output you can guess that pooled.sd wasn't recognized as it still says "t tests with pooled SD" but what if I don't even print this, e.g. when piping the output to a self-written function? There are chances that this error will never be recognized.

Should I write to some developers of R, that in future releases of R both spelling variants should be valid?

like image 888
absurd Avatar asked Jan 16 '23 03:01

absurd


1 Answers

The problem is not in the p-value correction, but in the (declaration of the) variance assumptions. You have used var.equal=T in your t.test calls and pooled.sd=FALSE in your pairwise.t.test calls. However, the argument for pairwise.t.test is pool.sd, not pooled.sd. Changing this gives p-values equivalent to the individual calls to t.test

pairwise.t.test(df$freq, df$class, p.adjust.method="none", 
                paired=FALSE, pool.sd=FALSE)
like image 88
Brian Diggs Avatar answered Jan 30 '23 02:01

Brian Diggs