Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linear Regression and storing results in data frame [duplicate]

I am running a linear regression on some variables in a data frame. I'd like to be able to subset the linear regressions by a categorical variable, run the linear regression for each categorical variable, and then store the t-stats in a data frame. I'd like to do this without a loop if possible.

Here's a sample of what I'm trying to do:

  a<-  c("a","a","a","a","a",
         "b","b","b","b","b",
         "c","c","c","c","c")     
  b<-  c(0.1,0.2,0.3,0.2,0.3,
         0.1,0.2,0.3,0.2,0.3,
         0.1,0.2,0.3,0.2,0.3)
  c<-  c(0.2,0.1,0.3,0.2,0.4,
         0.2,0.5,0.2,0.1,0.2,
         0.4,0.2,0.4,0.6,0.8)
      cbind(a,b,c)

I can begin by running the following linear regression and pulling the t-statistic out very easily:

  summary(lm(b~c))$coefficients[2,3]

However, I'd like to be able to run the regression for when column a is a, b, or c. I'd like to then store the t-stats in a table that looks like this:

variable t-stat
a        0.9
b        2.4
c        1.1

Hope that makes sense. Please let me know if you have any suggestions!

like image 417
Trexion Kameha Avatar asked Jan 19 '15 17:01

Trexion Kameha


2 Answers

You can use the lmList function from the nlme package to apply lm to subsets of data:

# the data
df <- data.frame(a, b, c)

library(nlme)
res <- lmList(b ~ c | a, df, pool = FALSE)
coef(summary(res))

The output:

, , (Intercept)

   Estimate Std. Error  t value   Pr(>|t|)
a 0.1000000 0.08086075 1.236694 0.30418942
b 0.2304348 0.08753431 2.632508 0.07815663
c 0.1461538 0.10029542 1.457233 0.24110393

, , c

     Estimate Std. Error    t value  Pr(>|t|)
a  0.50000000  0.3100868  1.6124515 0.2052590
b -0.04347826  0.3175203 -0.1369306 0.8997586
c  0.15384615  0.1923077  0.8000000 0.4821990

If you want the t values only, you can use this command:

coef(summary(res))[, "t value", -1]
#          a          b          c 
#  1.6124515 -0.1369306  0.8000000  
like image 72
Sven Hohenstein Avatar answered Oct 16 '22 05:10

Sven Hohenstein


Use split to subset the data and do the looping by lapply

dat <- data.frame(b,c)
dat_split <- split(x = dat, f = a)
res <- sapply(dat_split, function(x){
  summary(lm(b~c, data = x))$coefficients[2,3]
})

Reshape the result to your needs:

data.frame(variable = names(res), "t-stat" = res) 

  variable     t.stat
a        a  1.6124515
b        b -0.1369306
c        c  0.8000000
like image 35
Rentrop Avatar answered Oct 16 '22 05:10

Rentrop