I met a problem of running a t-test for some data stored in a data frame. I know how to do it one by one but not efficient at all. May I ask how to write a loop to do it?
For example, I have got the data in the testData:
testData <- dput(testData)
structure(list(Label = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("Bad", "Good"), class = "factor"), F1 = c(0.647789237,
0.546087915, 0.461342005, 0.794212207, 0.569199511, 0.735685704,
0.650942066, 0.457497016, 0.808619288, 0.673100668, 0.68781739,
0.470094549, 0.958591821, 1, 0.46908343, 0.578755283, 0.289380462,
0.685117658, 0.296011479, 0.208821225, 0.461487258, 0.176144907,
0.325684001), F2 = c(0.634327378, 0.602685034, 0.70643658, 0.577336318,
0.61069332, 0.676176013, 0.685433524, 0.601847779, 0.641738937,
0.822097452, 0.549508092, 0.711380436, 0.605492874, 0.419354439,
0.654424433, 0.782191133, 0.826394651, 0.63269692, 0.835389099,
0.760279322, 0.711607982, 1, 0.858631893), F3 = c(0.881115444,
0.850553659, 0.855405201, 0.732706141, 0.816063806, 0.841134018,
0.899594853, 0.788591779, 0.767461265, 0.954481259, 0.840970764,
0.897785959, 0.789288481, 0.604922471, 0.865024811, 0.947356946,
0.96622214, 0.879623595, 0.953189022, 0.960153373, 0.868949632,
1, 0.945716439), F4 = c(0.96939781, 0.758302, 0.652984943, 0.803719964,
0.980135127, 0.945287339, 0.84045753, 0.926053105, 0.974856922,
0.829936068, 0.89662815, 0.823594767, 1, 0.886954348, 0.825638185,
0.798524271, 0.524755093, 0.844685467, 0.522120663, 0.388604114,
0.725126521, 0.46430556, 0.604943457), F5 = c(0.908895247, 0.614799496,
0.529111461, 0.726753028, 0.942601677, 0.86641298, 0.75771251,
0.88237302, 1, 0.817706498, 0.834060845, 0.813550164, 0.927107922,
0.827680764, 0.797814872, 0.768118872, 0.271122929, 0.790632558,
0.391325631, 0.257446927, 0.687042673, 0.239520504, 0.521753545
), F6 = c(0.589651031, 0.170481902, 0.137755423, 0.24453692,
0.505348067, 0.642589538, 0.308854104, 0.286913756, 0.60756673,
0.531315171, 0.389958915, 0.236113471, 1, 0.687877983, 0.305962183,
0.40469629, 0.08012222, 0.376774451, 0.098261016, 0.046544022,
0.201513755, 0.02085411, 0.113698232), F7 = c(0.460358642, 0.629499543,
0.598616653, 0.623674078, 0.526920757, 0.494086383, 0.504021253,
0.635105287, 0.558992452, 0.397770725, 0.543528957, 0.538542617,
0.646897446, 0.543646493, 0.47463817, 0.385081029, 0.555731206,
0.43769237, 0.501754893, 0.586155312, 0.496028109, 1, 0.522921361
), F8 = c(0.523850222, 0.448936418, 0.339311791, 0.487421437,
0.462073661, 0.493421514, 0.464091025, 0.496938844, 0.5817454,
0.474404602, 0.720114482, 0.493098785, 1, 0.528538582, 0.478233718,
0.2695123, 0.362377901, 0.462252858, 0.287725327, 0.335584366,
0.397324649, 0.469082387, 0.403397835), F9 = c(0.481230473, 0.349419856,
0.309729777, 0.410783763, 0.465172146, 0.520935471, 0.380916463,
0.422238573, 0.572283353, 0.434705384, 0.512705279, 0.358892539,
1, 0.606926979, 0.370574926, 0.319739889, 0.249984729, 0.381053882,
0.245597953, 0.22883148, 0.314061676, 0.233511631, 0.269890359
), F10 = c(0.592403628, 0.249811036, 0.256613757, 0.305839002,
0.497637944, 0.601946334, 0.401643991, 0.302626606, 0.623582766,
0.706254724, 0.435846561, 0.324357521, 1, 0.740362812, 0.402588813,
0.537414966, 0.216458806, 0.464852608, 0.251228269, 0.181500378,
0.31840514, 0.068594104, 0.253873772), F11 = c(0.490032261, 0.366486136,
0.336749996, 0.421899324, 0.479339762, 0.527364467, 0.398297911,
0.432190187, 0.584030586, 0.453666402, 0.526861753, 0.388880674,
1, 0.615835576, 0.39058525, 0.350811433, 0.290220147, 0.397424867,
0.288095106, 0.274852912, 0.340129804, 0.271099396, 0.305499273
)), .Names = c("Label", "F1", "F2", "F3", "F4", "F5", "F6", "F7",
"F8", "F9", "F10", "F11"), class = "data.frame", row.names = c(NA,
-23L))
I need to run the t-test for each column with two independent groups, i.e., "Good" vs. "Bad" for several features "F1" to "F11". I tried to do something like:
GoodF1 <- subset(testData, Label == 'Good', select=c("F1"))
BadF1 <- subset(testData, Label == 'Bad', select=c("F1"))
t.test(GoodF1$F1,BadF1$F1)
And then do the rest of "F2" to "F11" but obviously not efficient. I really appreciate if you have better ideas to run it in a loop. Thanks very much.
Here's a simple solution, which doesn't require additional packages:
lapply(testData[-1], function(x) t.test(x ~ testData$Label))
Here testData[-1]
refers to all columns of testData
but the first one (which contains the labels). Negative indexing is used for excluding data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With