Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a loop to run the t-test of a data frame?

I met a problem of running a t-test for some data stored in a data frame. I know how to do it one by one but not efficient at all. May I ask how to write a loop to do it?

For example, I have got the data in the testData:

testData <- dput(testData)
structure(list(Label = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Bad", "Good"), class = "factor"), F1 = c(0.647789237, 
0.546087915, 0.461342005, 0.794212207, 0.569199511, 0.735685704, 
0.650942066, 0.457497016, 0.808619288, 0.673100668, 0.68781739, 
0.470094549, 0.958591821, 1, 0.46908343, 0.578755283, 0.289380462, 
0.685117658, 0.296011479, 0.208821225, 0.461487258, 0.176144907, 
0.325684001), F2 = c(0.634327378, 0.602685034, 0.70643658, 0.577336318, 
0.61069332, 0.676176013, 0.685433524, 0.601847779, 0.641738937, 
0.822097452, 0.549508092, 0.711380436, 0.605492874, 0.419354439, 
0.654424433, 0.782191133, 0.826394651, 0.63269692, 0.835389099, 
0.760279322, 0.711607982, 1, 0.858631893), F3 = c(0.881115444, 
0.850553659, 0.855405201, 0.732706141, 0.816063806, 0.841134018, 
0.899594853, 0.788591779, 0.767461265, 0.954481259, 0.840970764, 
0.897785959, 0.789288481, 0.604922471, 0.865024811, 0.947356946, 
0.96622214, 0.879623595, 0.953189022, 0.960153373, 0.868949632, 
1, 0.945716439), F4 = c(0.96939781, 0.758302, 0.652984943, 0.803719964, 
0.980135127, 0.945287339, 0.84045753, 0.926053105, 0.974856922, 
0.829936068, 0.89662815, 0.823594767, 1, 0.886954348, 0.825638185, 
0.798524271, 0.524755093, 0.844685467, 0.522120663, 0.388604114, 
0.725126521, 0.46430556, 0.604943457), F5 = c(0.908895247, 0.614799496, 
0.529111461, 0.726753028, 0.942601677, 0.86641298, 0.75771251, 
0.88237302, 1, 0.817706498, 0.834060845, 0.813550164, 0.927107922, 
0.827680764, 0.797814872, 0.768118872, 0.271122929, 0.790632558, 
0.391325631, 0.257446927, 0.687042673, 0.239520504, 0.521753545
), F6 = c(0.589651031, 0.170481902, 0.137755423, 0.24453692, 
0.505348067, 0.642589538, 0.308854104, 0.286913756, 0.60756673, 
0.531315171, 0.389958915, 0.236113471, 1, 0.687877983, 0.305962183, 
0.40469629, 0.08012222, 0.376774451, 0.098261016, 0.046544022, 
0.201513755, 0.02085411, 0.113698232), F7 = c(0.460358642, 0.629499543, 
0.598616653, 0.623674078, 0.526920757, 0.494086383, 0.504021253, 
0.635105287, 0.558992452, 0.397770725, 0.543528957, 0.538542617, 
0.646897446, 0.543646493, 0.47463817, 0.385081029, 0.555731206, 
0.43769237, 0.501754893, 0.586155312, 0.496028109, 1, 0.522921361
), F8 = c(0.523850222, 0.448936418, 0.339311791, 0.487421437, 
0.462073661, 0.493421514, 0.464091025, 0.496938844, 0.5817454, 
0.474404602, 0.720114482, 0.493098785, 1, 0.528538582, 0.478233718, 
0.2695123, 0.362377901, 0.462252858, 0.287725327, 0.335584366, 
0.397324649, 0.469082387, 0.403397835), F9 = c(0.481230473, 0.349419856, 
0.309729777, 0.410783763, 0.465172146, 0.520935471, 0.380916463, 
0.422238573, 0.572283353, 0.434705384, 0.512705279, 0.358892539, 
1, 0.606926979, 0.370574926, 0.319739889, 0.249984729, 0.381053882, 
0.245597953, 0.22883148, 0.314061676, 0.233511631, 0.269890359
), F10 = c(0.592403628, 0.249811036, 0.256613757, 0.305839002, 
0.497637944, 0.601946334, 0.401643991, 0.302626606, 0.623582766, 
0.706254724, 0.435846561, 0.324357521, 1, 0.740362812, 0.402588813, 
0.537414966, 0.216458806, 0.464852608, 0.251228269, 0.181500378, 
0.31840514, 0.068594104, 0.253873772), F11 = c(0.490032261, 0.366486136, 
0.336749996, 0.421899324, 0.479339762, 0.527364467, 0.398297911, 
0.432190187, 0.584030586, 0.453666402, 0.526861753, 0.388880674, 
1, 0.615835576, 0.39058525, 0.350811433, 0.290220147, 0.397424867, 
0.288095106, 0.274852912, 0.340129804, 0.271099396, 0.305499273
)), .Names = c("Label", "F1", "F2", "F3", "F4", "F5", "F6", "F7", 
"F8", "F9", "F10", "F11"), class = "data.frame", row.names = c(NA, 
-23L))

I need to run the t-test for each column with two independent groups, i.e., "Good" vs. "Bad" for several features "F1" to "F11". I tried to do something like:

GoodF1 <- subset(testData, Label == 'Good', select=c("F1"))
BadF1  <- subset(testData, Label == 'Bad', select=c("F1"))
t.test(GoodF1$F1,BadF1$F1)

And then do the rest of "F2" to "F11" but obviously not efficient. I really appreciate if you have better ideas to run it in a loop. Thanks very much.

like image 514
Samo Jerom Avatar asked Mar 05 '13 11:03

Samo Jerom


1 Answers

Here's a simple solution, which doesn't require additional packages:

lapply(testData[-1], function(x) t.test(x ~ testData$Label))

Here testData[-1] refers to all columns of testData but the first one (which contains the labels). Negative indexing is used for excluding data.

like image 149
Sven Hohenstein Avatar answered Oct 27 '22 23:10

Sven Hohenstein