I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible.
Here's a sample of what I'm trying to do:
  a<-  c("a","a","a","a","a",
         "b","b","b","b","b",
         "c","c","c","c","c")     
  b<-  c(0.1,0.2,0.3,0.2,0.3,
         0.1,0.2,0.3,0.2,0.3,
         0.1,0.2,0.3,0.2,0.3)
  c<-  c(0.2,0.1,0.3,0.2,0.4,
         0.2,0.5,0.2,0.1,0.2,
         0.4,0.2,0.4,0.6,0.8)
      cbind(a,b,c)
I can begin by running the following linear regression and pulling the t-statistic out very easily:
  summary(lm(b~c))$coefficients[2,3]
However, I'd like to be able to run the regression for when column a is a, b, or c. I'd like to then store the t-stats in a table that looks like this:
variable t-stat
a        0.9
b        2.4
c        1.1
Hope that makes sense. Please let me know if you have any suggestions!
You can use the lmList function from the nlme package to apply lm to subsets of data:
# the data
df <- data.frame(a, b, c)
library(nlme)
res <- lmList(b ~ c | a, df, pool = FALSE)
coef(summary(res))
The output:
, , (Intercept)
   Estimate Std. Error  t value   Pr(>|t|)
a 0.1000000 0.08086075 1.236694 0.30418942
b 0.2304348 0.08753431 2.632508 0.07815663
c 0.1461538 0.10029542 1.457233 0.24110393
, , c
     Estimate Std. Error    t value  Pr(>|t|)
a  0.50000000  0.3100868  1.6124515 0.2052590
b -0.04347826  0.3175203 -0.1369306 0.8997586
c  0.15384615  0.1923077  0.8000000 0.4821990
If you want the t values only, you can use this command:
coef(summary(res))[, "t value", -1]
#          a          b          c 
#  1.6124515 -0.1369306  0.8000000  
                        Use split to subset the data and do the looping by lapply
dat <- data.frame(b,c)
dat_split <- split(x = dat, f = a)
res <- sapply(dat_split, function(x){
  summary(lm(b~c, data = x))$coefficients[2,3]
})
Reshape the result to your needs:
data.frame(variable = names(res), "t-stat" = res) 
  variable     t.stat
a        a  1.6124515
b        b -0.1369306
c        c  0.8000000
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With