Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create custom geom to compute summary statistics and display them *outside* the plotting region

I am the creator of the R package EnvStats.

There is a function I use quite often called stripChart. I am just starting to learn ggplot2, and have spent the past several days poring over Hadley's book, Winston’s book, StackOverflow, and other resources in an attempt to create a geom that approximates what stripChart does. I am unable to figure out how to, within the geom, compute summary statistics and test results and then place them below the x-axis tick marks and also at the top of the plot (outside the plotting region). Here is a simple example using the built-in dataset mtcars:

library(EnvStats)
stripChart(mpg ~ cyl, data = mtcars, col = 1:3, 
  xlab = "Number of Cylinders", ylab = "Miles per Gallon", p.value = TRUE)

Here is an early draft of a geom to try to reproduce most of the functionality of stripChart:

geom_stripchart <- 
function(..., x.nudge = 0.3, 
  jitter.params = list(width = 0.3, height = 0), 
  mean.params = list(size = 2, position = position_nudge(x = x.nudge)), 
  errorbar.params = list(size = 1, width = 0.1, 
  position = position_nudge(x = x.nudge)), 
  n.text = TRUE, mean.sd.text = TRUE, p.value = FALSE) {
    params <- list(...)
    jitter.params   <- modifyList(params, jitter.params)
    mean.params     <- modifyList(params, mean.params)
    errorbar.params <- modifyList(params, errorbar.params)

    jitter <- do.call("geom_jitter", jitter.params)
    mean   <- do.call("stat_summary", modifyList(
      list(fun.y = "mean", geom = "point"), 
      mean.params)
    )
    errorbar <- do.call("stat_summary", modifyList(
      list(fun.data = "mean_cl_normal", geom = "errorbar"), 
      errorbar.params)
    )

    stripchart.list <- list(
      jitter, 
      theme(legend.position = "none"),
      mean, 
      errorbar
    )

    if(n.text || mean.sd.text) {
# Compute summary statistics (sample size, mean, SD) here?
      if(n.text) {
# Add information to stripchart.list to 
# compute sample size per group and add text below x-axis
      }
      if(mean.sd.text) {
# Add information to stripchart.list to 
# compute mean and SD and add text above top of plotting region
      }
    }
    if(p.value) {
# Add information to stripchart.list to 
# compute p-value (and 95% CI for difference if only 2 groups) 
# and add text above top of plotting region
    }
    stripchart.list
}


library(ggplot2)
dev.new()
p <- ggplot(mtcars, aes(x = factor(cyl), y = mpg, color = factor(cyl)))
p + geom_stripchart() + 
    xlab("Number of Cylinders") + 
    ylab("Miles per Gallon")

You can see that the plots are pretty much the same. The problem I’m having is figuring out how to add the sample size below each group, and to add the means and standard deviations at the top, along with the result of the ANOVA test (ignoring the issue of unequal variances at this point). I know it is straightforward to compute summary statistics and then plot them as points or text within the plotting area, but I don’t want to do that.

I have already found examples showing how to place text outside the plot (e.g., using annotation_custom()):
How can I add annotations below the x axis in ggplot2?

Displaying text below the plot generated by ggplot2

The problem is that the examples show how to do this where the user has pre-defined what the annotation is. My problem is that within geom_stripchart, I have to compute summary statistics and test results based on the data that was defined in the call to ggplot(), and then pass those results to annotation_custom(). I don’t know how to get at the x and y variables that are defined in the call to ggplot().

like image 747
Steve M Avatar asked Oct 12 '16 07:10

Steve M


1 Answers

I posted a simpler version of this question here: ggplot2: Adding sample size information to x-axis tick labels

I have updated the EnvStats package to include a geom called geom_stripchart which is an adaptation of the EnvStats function stripChart. See the help file for geom_stripchart for more information and a list of examples. Below is a simple example:

library(ggplot2)
library(EnvStats)

p <- ggplot(mtcars, aes(x = factor(cyl), y = mpg, color = factor(cyl))) 

p + geom_stripchart(test.text = TRUE) + 
  labs(x = "Number of Cylinders", y = "Miles per Gallon")

Demo of geom_stripchart

like image 72
Steve M Avatar answered Nov 11 '22 04:11

Steve M