Splitting data and running linear regression loop

Question

I have seen a lot of similar questions, but there is one key to the loop that I am trying to write that I am missing. I have a a set of dataset with ~4,000 different keys, and for each key, there are ~1,000 observations. I have filtered out a key to isolate the observations for that key, run linear regression, checked model assumptions and all looks good. However, I want to loop over this dataset and run that linear regression for each of the keys. Then I will want to store the coefficients, pvalues, R^2, etc and review them together.

Here is a sample of my data:

Key y1 x1 x2
A   10 1  3
A   11 2  4 
A   12 3  5
B   13 4  6 
B   14 5  7
B   15 6  8
C   16 7  9 
C   17 8  1
C   18 9  2

I have run:

datA <- data %>% filter(key=='A')
lm(y1 ~ x1 + x2, data = datA)

And then repeated that for keys B and C. Each question that I have seen on here is looking at the looping over the different variables for the entire set, but not splitting the data on the rows.

But I need to do this 4,000 more times. Any assistance to write this loop would be greatly appreciated (I am terrible at writing loops).

Ronak Shah · Accepted Answer

You could split the data and apply lm to each chunk.

list_models <- lapply(split(df, df$Key), function(x) lm(y1 ~ x1 + x2, data = x))

A tidyverse way would be :

library(dplyr)
library(purrr)

list_models <- df %>% group_split(Key) %>% map(~lm(y1 ~ x1 + x2, data = .x))

It returns a model for each individual Key.

list_models
#$A
#Call:
#lm(formula = y1 ~ x1 + x2, data = x)

#Coefficients:
#(Intercept)           x1           x2  
#          9            1           NA  

#$B
#Call:
#lm(formula = y1 ~ x1 + x2, data = x)

#Coefficients:
#(Intercept)           x1           x2  
#          9            1           NA  

#$C
#Call:
#lm(formula = y1 ~ x1 + x2, data = x)

#Coefficients:
#(Intercept)           x1           x2  
#   9.00e+00     1.00e+00     7.86e-16

Edward · Answer

Can also use the broom package to tidy the output into a more readable form.

list_models <- lapply(split(data, data$Key), function(x) lm(y1 ~ x1 + x2, data = x))

library(broom)

as_tibble(do.call(rbind, lapply(list_models, broom::tidy)))

# A tibble: 7 x 5
  term        estimate  std.error statistic    p.value
  <chr>          <dbl>      <dbl>     <dbl>      <dbl>
1 (Intercept) 9.00e+ 0   2.22e-15   4.05e15   1.57e-16
2 x1          1.00e+ 0   1.03e-15   9.73e14   6.54e-16
3 (Intercept) 9.00e+ 0   4.59e-15   1.96e15   3.25e-16
4 x1          1.00e+ 0   9.06e-16   1.10e15   5.77e-16
5 (Intercept) 9.00e+ 0 NaN        NaN       NaN       
6 x1          1.00e+ 0 NaN        NaN       NaN       
7 x2          3.02e-16 NaN        NaN       NaN

Splitting data and running linear regression loop

Tags:

loops

r

regression

lm

Ken

2 Answers

Ronak Shah

Edward

Recent Activity

Donate For Us

Splitting data and running linear regression loop

Tags:

loops

r

regression

lm

Ken

2 Answers

Ronak Shah

Edward

Related questions

Recent Activity

Donate For Us