Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify the function used by geom_smooth()

I would like to display a plot created by geom_smooth() but it is important for me to be able to describe how the plot was created.

I can see from the documentation when n >= 1000, gam is used as the smoothing function, but I cannot see how many knots are used or what function generated the smoothing.

Example:

library(ggplot2)

set.seed(12345)
n <- 3000
x1 <- seq(0, 4*pi,, n)
x2 <- runif(n)
x3 <- rnorm(n)
lp <- 2*sin(2* x1)+3*x2 + 3*x3
p <- 1/(1+exp(-lp))
y <- ifelse(p > 0.5, 1, 0)

df <- data.frame(x1, x2, x3, y)

# default plot
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth() 

# specify method='gam'
# linear
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth(method = 'gam') 

# specify gam and splines
# Shows non-linearity, but different from default
ggplot(df, aes(x = x1, y = y)) +
  geom_smooth(method = 'gam',
              method.args = list(family = "binomial"),
              formula = y ~ splines::ns(x, 7)) 

If I want to use the default parameters, is there a way to identify the function used to create the smoothing so I can accurately describe it in a methods section of the analysis?

variations of geom_smooth

like image 999
A Toll Avatar asked Nov 28 '16 21:11

A Toll


People also ask

What does Geom_smooth function do?

The geom smooth function is a function for the ggplot2 visualization package in R. Essentially, geom_smooth() adds a trend line over an existing plot.

What does Geom_smooth mean in R?

Key R function: geom_smooth() for adding smoothed conditional means / regression line. Key arguments: color , size and linetype : Change the line color, size and type. fill : Change the fill color of the confidence region.

What is the default method in Geom_smooth?

By default, the loess or gam function is used for smoothing (in relation to the size of dataset). Using the method combo-box, you can change this function to lm, glm, gam, loess, rlm. By the formula property you can set the formula that will be used in the smoothing function.

What does Geom_smooth plot?

stat_smooth() and geom_smooth() both are aliases. Both of them have the same arguments and both of them are used to plot a smooth line. We can plot a smooth line using the “loess” method of stat_smooth() function.


1 Answers

I wrote a function to reverse-engineer the steps used in StatSmooth's setup_params function to get the actual method / formula parameters used for plotting.

The function expects a ggplot object as its input, with an additional optional parameter specifying the layer that corresponds to geom_smooth (defaults to 1 if unspecified). It returns a text string in the form "Method: [method used], Formula: [formula used]", and also prints out all the parameters to console.

The envisaged use case is two-fold:

  1. Add the text string as-is to the plot as plot title / subtitle / caption, for quick reference during analysis;
  2. Read off the console printout, & include the information elsewhere or manually format it nicely (e.g. parsed plotmath expressions) for annotation in the plot, for report / presentation.

Function:

get.params <- function(plot, layer = 1){

  # return empty string if the specified geom layer doesn't use stat = "smooth"
  if(!"StatSmooth" %in% class(plot$layers[[layer]]$stat)){
    message("No smoothing function was used in this geom layer.")
    return("")
  }

  # recreate data used by this layer, in the format expected by StatSmooth
  # (this code chunk takes heavy reference from ggplot2:::ggplot_build.ggplot)
  layer.data <- plot$layers[[layer]]$layer_data(plot$data)
  layout <- ggplot2:::create_layout(plot$facet, plot$coordinates)
  data <- layout$setup(list(layer.data), plot$data, plot$plot_env)
  data[[1]] <- plot$layers[[layer]]$compute_aesthetics(data[[1]], plot)
  scales <- plot$scales
  data[[1]] <- ggplot2:::scales_transform_df(scales = scales, df = data[[1]])
  layout$train_position(data, scales$get_scales("x"), scales$get_scales("y"))
  data <- layout$map_position(data)[[1]]

  # set up stat params (e.g. replace "auto" with actual method / formula)
  stat.params <- suppressMessages(
    plot$layers[[layer]]$stat$setup_params(data = data, 
                                           params = plot$layers[[layer]]$stat_params)
    )

  # reverse the last step in setup_params; we don't need the actual function
  # for mgcv::gam, just the name
  if(identical(stat.params$method, mgcv::gam)) stat.params$method <- "gam"

  print(stat.params)

  return(paste0("Method: ", stat.params$method, ", Formula: ", deparse(stat.params$formula)))
}

Demonstration:

p <- ggplot(df, aes(x = x1, y = y)) # df is the sample dataset in the question

# default plot for 1000+ observations
# (method defaults to gam & formula to 'y ~ s(x, bs = "cs")')
p1 <- p + geom_smooth()
p1 + ggtitle(get.params(p1))

# specify method = 'gam'
# (formula defaults to `y ~ x`)
p2 <- p + geom_smooth(method='gam')
p2 + ggtitle(get.params(p2))

# specify method = 'gam' and splines for formula
p3 <- p + geom_smooth(method='gam',
              method.args = list(family = "binomial"),
              formula = y ~ splines::ns(x, 7))
p3 + ggtitle(get.params(p3))

# specify method = 'glm'
# (formula defaults to `y ~ x`)
p4 <- p + geom_smooth(method='glm')
p4 + ggtitle(get.params(p4))

# default plot for fewer observations
# (method defaults to loess & formula to `y ~ x`)
# observe that function is able to distinguish between plot data 
# & data actually used by the layer
p5 <- p + geom_smooth(data = . %>% slice(1:500))
p5 + ggtitle(get.params(p5))

plot

like image 55
Z.Lin Avatar answered Oct 12 '22 14:10

Z.Lin