I would like to display a plot created by geom_smooth()
but it is important for me to be able to describe how the plot was created.
I can see from the documentation when n >= 1000, gam is used as the smoothing function, but I cannot see how many knots are used or what function generated the smoothing.
Example:
library(ggplot2)
set.seed(12345)
n <- 3000
x1 <- seq(0, 4*pi,, n)
x2 <- runif(n)
x3 <- rnorm(n)
lp <- 2*sin(2* x1)+3*x2 + 3*x3
p <- 1/(1+exp(-lp))
y <- ifelse(p > 0.5, 1, 0)
df <- data.frame(x1, x2, x3, y)
# default plot
ggplot(df, aes(x = x1, y = y)) +
geom_smooth()
# specify method='gam'
# linear
ggplot(df, aes(x = x1, y = y)) +
geom_smooth(method = 'gam')
# specify gam and splines
# Shows non-linearity, but different from default
ggplot(df, aes(x = x1, y = y)) +
geom_smooth(method = 'gam',
method.args = list(family = "binomial"),
formula = y ~ splines::ns(x, 7))
If I want to use the default parameters, is there a way to identify the function used to create the smoothing so I can accurately describe it in a methods section of the analysis?
The geom smooth function is a function for the ggplot2 visualization package in R. Essentially, geom_smooth() adds a trend line over an existing plot.
Key R function: geom_smooth() for adding smoothed conditional means / regression line. Key arguments: color , size and linetype : Change the line color, size and type. fill : Change the fill color of the confidence region.
By default, the loess or gam function is used for smoothing (in relation to the size of dataset). Using the method combo-box, you can change this function to lm, glm, gam, loess, rlm. By the formula property you can set the formula that will be used in the smoothing function.
stat_smooth() and geom_smooth() both are aliases. Both of them have the same arguments and both of them are used to plot a smooth line. We can plot a smooth line using the “loess” method of stat_smooth() function.
I wrote a function to reverse-engineer the steps used in StatSmooth
's setup_params
function to get the actual method / formula parameters used for plotting.
The function expects a ggplot object as its input, with an additional optional parameter specifying the layer that corresponds to geom_smooth
(defaults to 1 if unspecified). It returns a text string in the form "Method: [method used], Formula: [formula used]"
, and also prints out all the parameters to console.
The envisaged use case is two-fold:
Function:
get.params <- function(plot, layer = 1){
# return empty string if the specified geom layer doesn't use stat = "smooth"
if(!"StatSmooth" %in% class(plot$layers[[layer]]$stat)){
message("No smoothing function was used in this geom layer.")
return("")
}
# recreate data used by this layer, in the format expected by StatSmooth
# (this code chunk takes heavy reference from ggplot2:::ggplot_build.ggplot)
layer.data <- plot$layers[[layer]]$layer_data(plot$data)
layout <- ggplot2:::create_layout(plot$facet, plot$coordinates)
data <- layout$setup(list(layer.data), plot$data, plot$plot_env)
data[[1]] <- plot$layers[[layer]]$compute_aesthetics(data[[1]], plot)
scales <- plot$scales
data[[1]] <- ggplot2:::scales_transform_df(scales = scales, df = data[[1]])
layout$train_position(data, scales$get_scales("x"), scales$get_scales("y"))
data <- layout$map_position(data)[[1]]
# set up stat params (e.g. replace "auto" with actual method / formula)
stat.params <- suppressMessages(
plot$layers[[layer]]$stat$setup_params(data = data,
params = plot$layers[[layer]]$stat_params)
)
# reverse the last step in setup_params; we don't need the actual function
# for mgcv::gam, just the name
if(identical(stat.params$method, mgcv::gam)) stat.params$method <- "gam"
print(stat.params)
return(paste0("Method: ", stat.params$method, ", Formula: ", deparse(stat.params$formula)))
}
Demonstration:
p <- ggplot(df, aes(x = x1, y = y)) # df is the sample dataset in the question
# default plot for 1000+ observations
# (method defaults to gam & formula to 'y ~ s(x, bs = "cs")')
p1 <- p + geom_smooth()
p1 + ggtitle(get.params(p1))
# specify method = 'gam'
# (formula defaults to `y ~ x`)
p2 <- p + geom_smooth(method='gam')
p2 + ggtitle(get.params(p2))
# specify method = 'gam' and splines for formula
p3 <- p + geom_smooth(method='gam',
method.args = list(family = "binomial"),
formula = y ~ splines::ns(x, 7))
p3 + ggtitle(get.params(p3))
# specify method = 'glm'
# (formula defaults to `y ~ x`)
p4 <- p + geom_smooth(method='glm')
p4 + ggtitle(get.params(p4))
# default plot for fewer observations
# (method defaults to loess & formula to `y ~ x`)
# observe that function is able to distinguish between plot data
# & data actually used by the layer
p5 <- p + geom_smooth(data = . %>% slice(1:500))
p5 + ggtitle(get.params(p5))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With