Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple T-test in R

I have a 94 varibles(sample+proteins+group) and 172 observations in a matrix as:

Sample   Protein1   Protein2 ... Protein92 Group
1          1.53      3.325   ...   5.63      0
2          2.32      3.451   ...   6.32      0
.
. 
.
103        3.24      4.21    ...   3.53      0               
104        3.44      5.22    ...   6.78      1
.
.
.
192        6.75      4.34    ...   6.15      1

Some of the sample are in group 0 and some are in group 1. I want to test if there is a differences between group 0 and 1 using a t-test and I want to do it for all the proteins. I was thinking of using an apply, but I am not sure how to use it. Also the names are not Protein1, protein2... , it is much longer so I would not want to have to write them all.

I also would only like the p-value for each protein in a matrix, something like this:

Protein  p-value
Protein1   0.00563
Protein2   0.0640
.
.
Protein92  0.610

Or something similar so that I after can find just the ones with a p-value lower than 0.05/92.


Edit:

Started working in long format this thing is not really a problem anymore:

library(tidyverse)

df %>%
gather(Protein, Value,-Sample,-Group)) %>%
group_by(Protein) %>%
do(broom::tidy(t.test(Value ~ Group, data = .))) %>%
ungroup() %>% 
mutate(Adjusted_pval = p.adjust(p.value, method = "fdr"))
like image 474
PrincessJellyfish Avatar asked Jun 30 '15 13:06

PrincessJellyfish


1 Answers

Try something like:

sapply(df[,2:93], function(i) t.test(i ~ df$Group)$p.value)

will return an array of p.value.

You could store this as a data.frame and look for low p-values by doing this:

x <- data.frame(p.value= sapply(df[,2:93], function(i) t.test(i ~ df$Group)$p.value))
x$protein_name <- rownames(x) # edit: new column for protein_name 
rownames(x) <- NULL           # edit: new column for protein_name
x[x$p.value < 0.05/92,]

Note that the names of the array elements and the row names of the data frame keep the Protein1, Protein2 etc. edit: I added a column for protein name per OP intent and deleted it from rownames so it wouldn't appear twice at print()

P.S. Glad to see you are adjusting p-value for multiple comparisons.

like image 108
C8H10N4O2 Avatar answered Nov 13 '22 05:11

C8H10N4O2