I would like to predict values from a linear regression from multiple groups in a single dataframe. I have found the following blogpost which ALMOST does everything I need: https://www.r-bloggers.com/2016/09/running-a-model-on-separate-groups/
However, I cannot combine this with the predict() function with a newdata. For one group, I use the following:
m <- lm(y ~ x, df)
new_df <- data.frame(x=c(5))
predict(m, new_df)
this gives me the predicted value for y at x=5.
How do I do this when I have multiple groups in my df? This is what I tried:
df %>%
nest(-group) %>%
mutate(fit = map(data, ~ lm(.$y ~ .$x)),
results = map(fit, predict)) %>%
unnest(results)
When I try to use results = map(fit, predict(new_df)), I only get an error. Is there a way how I can pass my value for x (in this case 5) into the code above?
Ideally, I would get a new data.frame with two columns, group and the predicted y-value.
This is a sample data.frame:
group x y
g1 1 2
g1 1.5 3
g1 2 4
g1 2.3 4.4
g1 3 6
g1 3.4 6.2
g1 4.11 7
g1 4.8 7.9
g1 5 8
g1 5.3 8.2
g2 2 5
g2 2.3 4
g2 4 2.2
g2 4.4 1.9
g2 7 0.3
EDIT:
Plotting the sample data using ggplot2, I get the following plot:
ggplot(df, aes(x,y,colour=group)) +
geom_point() +
stat_smooth(method="lm", se=FALSE)

Using the following code, I get the sought after predicted y-values:
predict(lm(y ~ x, df[df$group =="g1", ]), new_df)
1
8.180285
predict(lm(y ~ x, df[df$group =="g2", ]), new_df)
1
1.732136
I would like to generate a new dataframe which should look something like this and contain the predicted y-value at x=5:
group y_predict
g1 8.180285
g2 1.732136
Using the input shown reproducibly in the Note and since we only need the fitted values we don't need to use nest but can just use mutate:
library(dplyr)
df %>%
group_by(group) %>%
mutate(pred = fitted(lm(y ~ x))) %>%
ungroup %>%
select(group, pred)
giving:
# A tibble: 15 x 2
group pred
<chr> <dbl>
1 g1 2.47
2 g1 3.19
3 g1 3.90
4 g1 4.33
5 g1 5.33
6 g1 5.90
7 g1 6.91
8 g1 7.89
9 g1 8.18
10 g1 8.61
11 g2 4.41
12 g2 4.15
13 g2 2.63
14 g2 2.27
15 g2 -0.0563
This could also be done like this:
library(dplyr)
df %>%
mutate(pred = fitted(lm(y ~ x*group + 0, df))) %>%
select(group, pred)
or like this using base R only:
transform(df, pred = fitted(lm(y ~ x*group + 0, df)))[c("group", "pred")]
or using lmList from nlme (which comes with R so it does not have to be installed):
library(dplyr)
library(nlme)
df %>%
mutate(pred = fitted(lmList(y ~ x | group, df))) %>%
select(group, pred)
or using lmList without dplyr:
library(nlme)
transform(df, pred = fitted(lmList(y ~ x | group, df)))[c("group", "pred")]
Lines <- "
group x y
g1 1 2
g1 1.5 3
g1 2 4
g1 2.3 4.4
g1 3 6
g1 3.4 6.2
g1 4.11 7
g1 4.8 7.9
g1 5 8
g1 5.3 8.2
g2 2 5
g2 2.3 4
g2 4 2.2
g2 4.4 1.9
g2 7 0.3"
df <- read.table(text = Lines, header = TRUE)
Regarding comment this code produces the prediction for x = 5 by group:
df %>%
group_by(group) %>%
summarize(pred = predict(lm(y ~ x), list(x = 5)), .groups = "drop") %>%
select(group, pred)
## # A tibble: 2 x 2
## group pred
## <chr> <dbl>
## 1 g1 8.18
## 2 g2 1.73
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With