Predict linear regression with multiple separate groups

Question

I would like to predict values from a linear regression from multiple groups in a single dataframe. I have found the following blogpost which ALMOST does everything I need: https://www.r-bloggers.com/2016/09/running-a-model-on-separate-groups/

However, I cannot combine this with the predict() function with a newdata. For one group, I use the following:

m <- lm(y ~ x, df)
new_df <- data.frame(x=c(5))
predict(m, new_df)

this gives me the predicted value for y at x=5.

How do I do this when I have multiple groups in my df? This is what I tried:

df %>%
    nest(-group) %>%
    mutate(fit = map(data, ~ lm(.$y ~ .$x)),
           results = map(fit, predict)) %>%
    unnest(results)

When I try to use results = map(fit, predict(new_df)), I only get an error. Is there a way how I can pass my value for x (in this case 5) into the code above?

Ideally, I would get a new data.frame with two columns, group and the predicted y-value.

This is a sample data.frame:

group   x   y
g1  1   2
g1  1.5 3
g1  2   4
g1  2.3 4.4
g1  3   6
g1  3.4 6.2
g1  4.11    7
g1  4.8 7.9
g1  5   8
g1  5.3 8.2
g2  2   5
g2  2.3 4
g2  4   2.2
g2  4.4 1.9
g2  7   0.3

EDIT:

Plotting the sample data using ggplot2, I get the following plot:

ggplot(df, aes(x,y,colour=group)) +
 geom_point() +
 stat_smooth(method="lm", se=FALSE)

Plot

Using the following code, I get the sought after predicted y-values:

predict(lm(y ~ x, df[df$group =="g1", ]), new_df)
       1 
8.180285 

predict(lm(y ~ x, df[df$group =="g2", ]), new_df)
       1 
1.732136

I would like to generate a new dataframe which should look something like this and contain the predicted y-value at x=5:

group   y_predict  
g1  8.180285  
g2  1.732136

G. Grothendieck · Accepted Answer

Using the input shown reproducibly in the Note and since we only need the fitted values we don't need to use nest but can just use mutate:

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(pred = fitted(lm(y ~ x))) %>%
  ungroup %>%
  select(group, pred)

giving:

# A tibble: 15 x 2
   group    pred
   <chr>   <dbl>
 1 g1     2.47  
 2 g1     3.19  
 3 g1     3.90  
 4 g1     4.33  
 5 g1     5.33  
 6 g1     5.90  
 7 g1     6.91  
 8 g1     7.89  
 9 g1     8.18  
10 g1     8.61  
11 g2     4.41  
12 g2     4.15  
13 g2     2.63  
14 g2     2.27  
15 g2    -0.0563

This could also be done like this:

library(dplyr)

df %>%
  mutate(pred = fitted(lm(y ~ x*group + 0, df))) %>%
  select(group, pred)

or like this using base R only:

transform(df, pred = fitted(lm(y ~ x*group + 0, df)))[c("group", "pred")]

or using lmList from nlme (which comes with R so it does not have to be installed):

library(dplyr)
library(nlme)

df %>%
  mutate(pred = fitted(lmList(y ~ x | group, df))) %>%
  select(group, pred)

or using lmList without dplyr:

library(nlme)

transform(df, pred = fitted(lmList(y ~ x | group, df)))[c("group", "pred")]

Note

Lines <- "
group   x   y
g1  1   2
g1  1.5 3
g1  2   4
g1  2.3 4.4
g1  3   6
g1  3.4 6.2
g1  4.11    7
g1  4.8 7.9
g1  5   8
g1  5.3 8.2
g2  2   5
g2  2.3 4
g2  4   2.2
g2  4.4 1.9
g2  7   0.3"
df <- read.table(text = Lines, header = TRUE)

Added

Regarding comment this code produces the prediction for x = 5 by group:

df %>%
  group_by(group) %>%
  summarize(pred = predict(lm(y ~ x), list(x = 5)), .groups = "drop") %>%
  select(group, pred)
## # A tibble: 2 x 2
##   group  pred
##   <chr> <dbl>
## 1 g1     8.18
## 2 g2     1.73

Predict linear regression with multiple separate groups

Tags:

r

dplyr

linear-regression

predict

Servus

1 Answers

Note

Added

G. Grothendieck

Recent Activity

Donate For Us

Predict linear regression with multiple separate groups

Tags:

r

dplyr

linear-regression

predict

Servus

1 Answers

Note

Added

G. Grothendieck

Related questions

Recent Activity

Donate For Us