How can I add lines connecting a regression equation to specific points in the x axis and to the corresponding values on the y axis?
Here is a reproducible example:
library(ggplot2)
library(ggpmisc)
x<-c(1,2,3,5,10,12,15,20,22,25,30,33,37)
y<-c(1000,800,100,10,1,0.3,0.25,0.2,0.1,0.1,0.03,0.05,0.03)
myformula<-y ~ poly(x,3)
df <- data.frame(x, y)
ggplot(df, aes(x,y)) +
stat_smooth(method = lm, formula = myformula) +
geom_point() +
stat_smooth(method = lm, formula = myformula) +
stat_poly_eq(formula = myformula, eq.with.lhs = "italic(psi)~`=`~",
eq.x.rhs = "~italic(theta)",
aes(label = paste(..eq.label.., ..rr.label..,
sep = "~~~~")), label.x=0.15, parse = TRUE)+
xlim(0, 40)+
ylim(0, 2000)+
scale_y_log10(breaks = c(0, 0.1,10,1000), labels= c(0,0.1, 10,1000))
This is what I have:
And this is what I would like:
Adding a regression line on a ggplot You can use geom_smooth() with method = "lm" . This will automatically add a regression line for y ~ x to the plot.
You'll first want to save your plot for later use, here I'm saving it into object p
(I'm ignoring the stuff that's not related to your question).
p <- ggplot(df, aes(x,y)) +
stat_smooth(method = lm, formula = myformula) +
geom_point() +
xlim(0, 40) +
scale_y_log10(breaks = c(0, 0.1,10,1000), labels= c(0,0.1, 10,1000))
The ggplot2
package has a function ggplot_build()
which allows you to observe all the makings of the plot.
plot_str <- ggplot_build(p)
The object created is a list, which has a data
element, which is in itself a list of all the data frames for each of the geoms used to build the plot. Here we are interested in the line chart, which is the 2nd data frame in that list.
head(plot_str$data[[2]])
x y ymin ymax se flipped_aes PANEL group colour fill size linetype weight alpha
1 1.000000 3.019354 2.645929 3.392780 0.1650750 FALSE 1 -1 #3366FF grey60 1 1 1 0.4
2 1.455696 2.796358 2.458116 3.134599 0.1495218 FALSE 1 -1 #3366FF grey60 1 1 1 0.4
3 1.911392 2.581749 2.273151 2.890348 0.1364177 FALSE 1 -1 #3366FF grey60 1 1 1 0.4
4 2.367089 2.375366 2.090702 2.660030 0.1258375 FALSE 1 -1 #3366FF grey60 1 1 1 0.4
5 2.822785 2.177044 1.910571 2.443518 0.1177963 FALSE 1 -1 #3366FF grey60 1 1 1 0.4
6 3.278481 1.986621 1.732776 2.240466 0.1122137 FALSE 1 -1 #3366FF grey60 1 1 1 0.4
Now we can just grab a couple of points. Here I'm grabbing the 5th and the 70th row.
specific_points <- plot_str$data[[2]][c(5, 70), ]
And then getting back to the earlier version of the plot, I'm adding a few segment geoms referencing those points.
p +
geom_segment(y = specific_points$y[1], yend = specific_points$y[1], x = -Inf, xend = specific_points$x[1]) +
geom_segment(y = specific_points$y[1], yend = -Inf, x = specific_points$x[1], xend = specific_points$x[1], linetype = "dashed") +
geom_segment(y = specific_points$y[2], yend = specific_points$y[2], x = -Inf, xend = specific_points$x[2]) +
geom_segment(y = specific_points$y[2], yend = -Inf, x = specific_points$x[2], xend = specific_points$x[2], linetype = "dashed")
(My answer doesn't quite work yet, because I haven't figured out how to replicate the scale transformation ggplot2 does before it fits the poly line. I'd love to hear any suggestions!)
My approach was to fit the curve outside of ggplot, and use those results to drive the annotations. Here's a table showing the coordinates of two points on the fitted poly line:
points <- c(4, 34)
lines <- data.frame(
x = points,
y = predict(lm(myformula), data.frame(x = points))
)
#lines
# x y
#1 4 370.537
#2 34 41.233
Then we can feed those into geom_segment:
ggplot(df, aes(x,y)) +
stat_smooth(method = lm, formula = myformula) +
geom_point() +
stat_poly_eq(formula = myformula, eq.with.lhs = "italic(psi)~`=`~",
eq.x.rhs = "~italic(theta)",
aes(label = paste(..eq.label.., ..rr.label..,
sep = "~~~~")), label.x=0.15, parse = TRUE)+
geom_segment(data = lines, lty = "dashed",
aes(x = x, xend = x, y = 0, yend = y)) +
geom_segment(data = lines, lty = "dashed",
aes(x = 0, xend = x, y = y, yend = y))
Unfortunately, this doesn't work if y is transformed, as in the original question. I learned that ggplot2 transforms the scales before it fits the model, so the fit ggplot2 is using will vary from the untransformed version.
Here we see that the transformed data after adding scale_y_log10(breaks = c(0, 0.1,10,1000), labels= c(0,0.1, 10,1000))
can be fit more closely to a cubic curve than the untransformed data, with r^2 increasing to 0.98. However, the old line segments won't work any more since the fit has changed. I will leave the correction of that fit to the reader since I can't figure it out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With