Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I add lines to connect points on regression line to both x and y axis on ggplot?

Tags:

r

ggplot2

How can I add lines connecting a regression equation to specific points in the x axis and to the corresponding values on the y axis?

Here is a reproducible example:

library(ggplot2)
library(ggpmisc)

x<-c(1,2,3,5,10,12,15,20,22,25,30,33,37)

y<-c(1000,800,100,10,1,0.3,0.25,0.2,0.1,0.1,0.03,0.05,0.03)

myformula<-y ~ poly(x,3)

df <- data.frame(x, y)

ggplot(df, aes(x,y)) + 
  stat_smooth(method = lm, formula = myformula) + 
  geom_point() + 
  stat_smooth(method = lm, formula = myformula) +
  stat_poly_eq(formula = myformula, eq.with.lhs = "italic(psi)~`=`~",
               eq.x.rhs = "~italic(theta)", 
               aes(label = paste(..eq.label.., ..rr.label.., 
                                 sep = "~~~~")), label.x=0.15, parse = TRUE)+ 
  xlim(0, 40)+  
  ylim(0, 2000)+ 
  scale_y_log10(breaks = c(0, 0.1,10,1000), labels= c(0,0.1, 10,1000))

This is what I have: enter image description here

And this is what I would like: enter image description here

like image 510
paml Avatar asked Feb 11 '21 04:02

paml


People also ask

How do I add a linear regression line in ggplot2?

Adding a regression line on a ggplot You can use geom_smooth() with method = "lm" . This will automatically add a regression line for y ~ x to the plot.


2 Answers

You'll first want to save your plot for later use, here I'm saving it into object p (I'm ignoring the stuff that's not related to your question).

p <- ggplot(df, aes(x,y)) + 
  stat_smooth(method = lm, formula = myformula) + 
  geom_point() + 
  xlim(0, 40) +  
  scale_y_log10(breaks = c(0, 0.1,10,1000), labels= c(0,0.1, 10,1000))

The ggplot2 package has a function ggplot_build() which allows you to observe all the makings of the plot.

plot_str <- ggplot_build(p)

The object created is a list, which has a data element, which is in itself a list of all the data frames for each of the geoms used to build the plot. Here we are interested in the line chart, which is the 2nd data frame in that list.

head(plot_str$data[[2]])
         x        y     ymin     ymax        se flipped_aes PANEL group  colour   fill size linetype weight alpha
1 1.000000 3.019354 2.645929 3.392780 0.1650750       FALSE     1    -1 #3366FF grey60    1        1      1   0.4
2 1.455696 2.796358 2.458116 3.134599 0.1495218       FALSE     1    -1 #3366FF grey60    1        1      1   0.4
3 1.911392 2.581749 2.273151 2.890348 0.1364177       FALSE     1    -1 #3366FF grey60    1        1      1   0.4
4 2.367089 2.375366 2.090702 2.660030 0.1258375       FALSE     1    -1 #3366FF grey60    1        1      1   0.4
5 2.822785 2.177044 1.910571 2.443518 0.1177963       FALSE     1    -1 #3366FF grey60    1        1      1   0.4
6 3.278481 1.986621 1.732776 2.240466 0.1122137       FALSE     1    -1 #3366FF grey60    1        1      1   0.4

Now we can just grab a couple of points. Here I'm grabbing the 5th and the 70th row.

specific_points <- plot_str$data[[2]][c(5, 70), ]

And then getting back to the earlier version of the plot, I'm adding a few segment geoms referencing those points.

p + 
  geom_segment(y = specific_points$y[1], yend = specific_points$y[1], x = -Inf, xend = specific_points$x[1]) + 
  geom_segment(y = specific_points$y[1], yend = -Inf, x = specific_points$x[1], xend = specific_points$x[1], linetype = "dashed") + 
  geom_segment(y = specific_points$y[2], yend = specific_points$y[2], x = -Inf, xend = specific_points$x[2]) + 
  geom_segment(y = specific_points$y[2], yend = -Inf, x = specific_points$x[2], xend = specific_points$x[2], linetype = "dashed")

enter image description here

like image 109
Phil Avatar answered Nov 11 '22 06:11

Phil


(My answer doesn't quite work yet, because I haven't figured out how to replicate the scale transformation ggplot2 does before it fits the poly line. I'd love to hear any suggestions!)

My approach was to fit the curve outside of ggplot, and use those results to drive the annotations. Here's a table showing the coordinates of two points on the fitted poly line:

points <- c(4, 34)
lines <- data.frame(
  x = points,
  y = predict(lm(myformula), data.frame(x = points))
)
#lines
#   x       y
#1  4 370.537
#2 34  41.233

Then we can feed those into geom_segment:

ggplot(df, aes(x,y)) + 
  stat_smooth(method = lm, formula = myformula) + 
  geom_point() + 
  stat_poly_eq(formula = myformula, eq.with.lhs = "italic(psi)~`=`~",
               eq.x.rhs = "~italic(theta)", 
               aes(label = paste(..eq.label.., ..rr.label.., 
                                 sep = "~~~~")), label.x=0.15, parse = TRUE)+ 
  geom_segment(data = lines, lty = "dashed",
               aes(x = x, xend = x, y = 0, yend = y)) +
  geom_segment(data = lines, lty = "dashed",
               aes(x = 0, xend = x, y = y, yend = y))

enter image description here

Unfortunately, this doesn't work if y is transformed, as in the original question. I learned that ggplot2 transforms the scales before it fits the model, so the fit ggplot2 is using will vary from the untransformed version.

Here we see that the transformed data after adding scale_y_log10(breaks = c(0, 0.1,10,1000), labels= c(0,0.1, 10,1000)) can be fit more closely to a cubic curve than the untransformed data, with r^2 increasing to 0.98. However, the old line segments won't work any more since the fit has changed. I will leave the correction of that fit to the reader since I can't figure it out.

enter image description here

like image 34
Jon Spring Avatar answered Nov 11 '22 06:11

Jon Spring