Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add R2 for each facet of ggplot in R?

Is there a way to first change the facet label from 1:3 to something like c(good, bad, ugly). Also, i would like to add R2 value to each of the facet. Below is my code- i tried a few things but didn't succeed.

DF = data.frame(SUB = rep(1:3, each = 100), Ob = runif(300, 50,100), S1 = runif(300, 75,95), S2 = runif(300, 40,90),
                S3 = runif(300, 35,80),S4 = runif(300, 55,100))
FakeData = gather(DF, key = "Variable", value = "Value", -c(SUB,Ob))

ggplot(FakeData, aes(x = Ob, y = Value))+
  geom_point()+ geom_smooth(method="lm") + facet_grid(Variable ~ SUB,  scales = "free_y")+
  theme_bw()

Here is the figure that i am getting using above code. enter image description here I tried below code to change the facet_label but it didn't work

ggplot(FakeData, SUB = factor(SUB, levels = c("Good", "Bad","Ugly")), aes(x = Ob, y = Value))+
  geom_point()+ geom_smooth(method="lm") + facet_grid(Variable ~ SUB,  scales = "free_y")+
  theme_bw()

I do not have any idea how to add R2 to the facets. Is there any efficient way of computing and R2 to the facets?

like image 725
Hydro Avatar asked Feb 10 '20 01:02

Hydro


People also ask

How do you add a regression line to a scatter plot in R?

A regression line will be added on the plot using the function abline(), which takes the output of lm() as an argument. You can also add a smoothing line using the function loess().

What is the function of Facet_grid () in Ggplot ()?

facet_grid() forms a matrix of panels defined by row and column faceting variables. It is most useful when you have two discrete variables, and all combinations of the variables exist in the data.


Video Answer


3 Answers

You can use ggpubr::stat_cor() to easily add correlation coefficients to your plot.

library(dplyr)
library(ggplot2)
library(ggpubr)

FakeData %>%
  mutate(SUB = factor(SUB, labels = c("good", "bad", "ugly"))) %>%
  ggplot(aes(x = Ob, y = Value)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_grid(Variable ~ SUB,  scales = "free_y") +
  theme_bw() +
  stat_cor(aes(label = ..rr.label..), color = "red", geom = "label")

enter image description here

like image 68
Ritchie Sacramento Avatar answered Oct 07 '22 23:10

Ritchie Sacramento


If you don't want to use functions from other packages and only want to use ggplot2, you will need to compute the R2 for each SUB and Variable combination, and then add to your plot with geom_text or geom_label. Here is one way to do it.

library(tidyverse)

set.seed(1)

DF = data.frame(SUB = rep(1:3, each = 100), Ob = runif(300, 50,100), S1 = runif(300, 75,95), S2 = runif(300, 40,90),
                S3 = runif(300, 35,80),S4 = runif(300, 55,100))
FakeData = gather(DF, key = "Variable", value = "Value", -c(SUB,Ob))

FakeData_lm <- FakeData %>%
  group_by(SUB, Variable) %>%
  nest() %>%
  # Fit linear model
  mutate(Mod = map(data, ~lm(Value ~ Ob, data = .x))) %>%
  # Get the R2
  mutate(R2 = map_dbl(Mod, ~round(summary(.x)$r.squared, 3))) 

ggplot(FakeData, aes(x = Ob, y = Value))+
  geom_point()+ 
  geom_smooth(method="lm") + 
  # Add label
  geom_label(data = FakeData_lm, 
             aes(x = Inf, y = Inf, 
                 label = paste("R2 = ", R2, sep = " ")),
             hjust = 1, vjust = 1) +
  facet_grid(Variable ~ SUB,  scales = "free_y") +
  theme_bw()

enter image description here

like image 40
www Avatar answered Oct 07 '22 23:10

www


The following answer makes use of package 'ggpmisc' (version >= 0.5.0 for the second example). In addition, I simply used a call to factor() within aes() to set the labels.

library(tidyverse)
library(ggpmisc)

DF = data.frame(SUB = rep(1:3, each = 100), Ob = runif(300, 50,100), S1 = runif(300, 75,95), S2 = runif(300, 40,90),
                S3 = runif(300, 35,80),S4 = runif(300, 55,100))
FakeData = gather(DF, key = "Variable", value = "Value", -c(SUB,Ob))

# As asked in the question
# Ensuring that the R^2 label does not overlap the observations
ggplot(FakeData, aes(x = Ob, y = Value)) +
  geom_point()+ 
  geom_smooth(method = "lm") + 
  stat_poly_eq() +
  scale_y_continuous(expand = expansion(mult = c(0.1, 0.33))) +
  facet_grid(Variable ~ factor(SUB, 
                               levels = 1:3,
                               labels = c("good", "bad", "ugly")), 
             scales = "free_y") +
  theme_bw()

# As asked in a comment, adding P-value
ggplot(FakeData, aes(x = Ob, y = Value))+
  geom_point()+ 
  geom_smooth(method = "lm") + 
  stat_poly_eq(mapping = use_label(c("R2", "P")), p.digits = 2) +
  scale_y_continuous(expand = expansion(mult = c(0.1, 0.33))) +
  facet_grid(Variable ~ factor(SUB, 
                               levels = 1:3,
                               labels = c("good", "bad", "ugly")),
             scales = "free_y")+
  theme_bw()

enter image description here

And the plot from the second example adding P to the label.

Note: With older versions of 'ggpmisc' which lack function use_label() the mapping can be written as aes(label = paste(after_stat(rr.label), after_stat(p.label), sep = "*\", \"*") in the same way as when using 'ggpubr'.

enter image description here

Package 'ggpubr' includes code copied from 'ggpmisc' without acknowledgenment, which explains why some statistics are so similar between the two packages. 'ggpmisc' is more narrowly focused but the statistics in it have been much improved after they were taken into 'ggpubr'. 'ggpmisc' is actively maintained while maintenance of 'ggpubr' seems currently stalled.

like image 22
Pedro J. Aphalo Avatar answered Oct 07 '22 22:10

Pedro J. Aphalo