Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting data and running linear regression loop

I have seen a lot of similar questions, but there is one key to the loop that I am trying to write that I am missing. I have a a set of dataset with ~4,000 different keys, and for each key, there are ~1,000 observations. I have filtered out a key to isolate the observations for that key, run linear regression, checked model assumptions and all looks good. However, I want to loop over this dataset and run that linear regression for each of the keys. Then I will want to store the coefficients, pvalues, R^2, etc and review them together.

Here is a sample of my data:

Key y1 x1 x2
A   10 1  3
A   11 2  4 
A   12 3  5
B   13 4  6 
B   14 5  7
B   15 6  8
C   16 7  9 
C   17 8  1
C   18 9  2

I have run:

datA <- data %>% filter(key=='A')
lm(y1 ~ x1 + x2, data = datA)

And then repeated that for keys B and C. Each question that I have seen on here is looking at the looping over the different variables for the entire set, but not splitting the data on the rows.

But I need to do this 4,000 more times. Any assistance to write this loop would be greatly appreciated (I am terrible at writing loops).

like image 700
Ken Avatar asked Apr 30 '26 14:04

Ken


2 Answers

You could split the data and apply lm to each chunk.

list_models <- lapply(split(df, df$Key), function(x) lm(y1 ~ x1 + x2, data = x))

A tidyverse way would be :

library(dplyr)
library(purrr)

list_models <- df %>% group_split(Key) %>% map(~lm(y1 ~ x1 + x2, data = .x))

It returns a model for each individual Key.

list_models
#$A
#Call:
#lm(formula = y1 ~ x1 + x2, data = x)

#Coefficients:
#(Intercept)           x1           x2  
#          9            1           NA  

#$B
#Call:
#lm(formula = y1 ~ x1 + x2, data = x)

#Coefficients:
#(Intercept)           x1           x2  
#          9            1           NA  

#$C
#Call:
#lm(formula = y1 ~ x1 + x2, data = x)

#Coefficients:
#(Intercept)           x1           x2  
#   9.00e+00     1.00e+00     7.86e-16  
like image 185
Ronak Shah Avatar answered May 02 '26 05:05

Ronak Shah


Can also use the broom package to tidy the output into a more readable form.

list_models <- lapply(split(data, data$Key), function(x) lm(y1 ~ x1 + x2, data = x))

library(broom)

as_tibble(do.call(rbind, lapply(list_models, broom::tidy)))

# A tibble: 7 x 5
  term        estimate  std.error statistic    p.value
  <chr>          <dbl>      <dbl>     <dbl>      <dbl>
1 (Intercept) 9.00e+ 0   2.22e-15   4.05e15   1.57e-16
2 x1          1.00e+ 0   1.03e-15   9.73e14   6.54e-16
3 (Intercept) 9.00e+ 0   4.59e-15   1.96e15   3.25e-16
4 x1          1.00e+ 0   9.06e-16   1.10e15   5.77e-16
5 (Intercept) 9.00e+ 0 NaN        NaN       NaN       
6 x1          1.00e+ 0 NaN        NaN       NaN       
7 x2          3.02e-16 NaN        NaN       NaN  
like image 33
Edward Avatar answered May 02 '26 04:05

Edward



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!