Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: ggplot2, how to annotate summary statistics on each panel of a panel plot

Tags:

r

ggplot2

How would I add a text annotation (eg. sd = sd_value) of the standard deviation in each panel of the following plot using ggplot2 in R?

library(datasets)
data(mtcars)
ggplot(data = mtcars, aes(x = hp)) + 
        geom_dotplot(binwidth = 1) + 
        geom_density() + 
        facet_grid(. ~ cyl) + 
        theme_bw()

I'd post an image of the plot, but I don't have enough rep.

I think "geom_text" or "annotate" might be useful but I'm not sure quite sure how.

like image 823
adatum Avatar asked May 28 '15 01:05

adatum


People also ask

How do you annotate a plot in R?

If you want to annotate your plot or figure with labels, there are two basic options: text() will allow you to add labels to the plot region, and mtext() will allow you to add labels to the margins. For the plot region, to add labels you need to specify the coordinates and the label.

Which geometrical object may be used to annotate a plot in ggplot2?

You can use the geom_text() or geom_label() functions to add labels to your data points. These two geometries work similarly to geom_point() , except they also need an aesthetic called label to indicate which variable should be used as the text to plot.

What does annotate function do in R?

The annotate() function allows to add all kind of shape on a ggplot2 chart. The first argument will control what kind is used: rect or segment for rectangle, segment or arrow.


2 Answers

If you want to vary the text label in each facet, you will want to use geom_text. If you want the same text to appear in each facet, you can use annotate.

p <- ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl)

mylabels <- data.frame(cyl = c(4, 6, 8), 
                       label = c("first label", "seond label different", "and another"))

p + geom_text(x = 200, y = 0.75, aes(label = label), data = my labels)

### compare that to this way with annotate

p + annotate("text", x = 200, y = 0.75, label = "same label everywhere")

Now, if you really want standard deviation by cyl in this example, I'd probably use dplyr to do the calculation first and then complete this with geom_text like so:

library(ggplot2)
library(dplyr)

df.sd.hp <- mtcars %>%
  group_by(cyl) %>%
  summarise(hp.sd = round(sd(hp), 2))

ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl) +
  geom_text(x = 200, y = 0.75, 
            aes(label = paste0("SD: ", hp.sd)), 
            data = df.sd.hp)
like image 94
JasonAizkalns Avatar answered Oct 04 '22 20:10

JasonAizkalns


I prefer the appearance of the graph when the statistic appears within the facet label itself. I made the following script, which allows the choice of displaying the standard deviation, mean or count. Essentially it calculates the summary statistic then merges this with the name so that you have the format CATEGORY (SUMMARY STAT = VALUE).

   #' Function will update the name with the statistic of your choice
AddNameStat <- function(df, category, count_col, stat = c("sd","mean","count"), dp= 0){

  # Create temporary data frame for analysis
  temp <- data.frame(ref = df[[category]], comp = df[[count_col]])

  # Aggregate the variables and calculate statistics
  agg_stats <- plyr::ddply(temp, .(ref), summarize,
                           sd = sd(comp),
                           mean = mean(comp),
                           count = length(comp))

  # Dictionary used to replace stat name with correct symbol for plot
  labelName <- mapvalues(stat, from=c("sd","mean","count"), to=c("\u03C3", "x", "n"))

  # Updates the name based on the selected variable
  agg_stats$join <- paste0(agg_stats$ref, " \n (", labelName," = ",
                           round(agg_stats[[stat]], dp), ")")

  # Map the names
  name_map <- setNames(agg_stats$join, as.factor(agg_stats$ref))
  return(name_map[as.character(df[[category]])])
}

Using this script with your original question:

library(datasets)
data(mtcars)

# Update the variable name
mtcars$cyl  <- AddNameStat(mtcars, "cyl", "hp", stat = "sd")

ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl) + 
  theme_bw()

enter image description here

The script should be easy to alter to include other summary statistics. I am also sure it could be rewritten in parts to make it a bit cleaner!

like image 40
Michael Harper Avatar answered Oct 04 '22 20:10

Michael Harper