Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting means on histograms created with facet_wrap

I'm making a several histograms using ggplot2 and facet_wrap and would like to plot the mean value on each panel. Below, I create a dummy data frame, find the mean of each facet, and then create the plots adding the mean using geom_point.

# Load libraries 
library(tidyverse)

# Toy data frame
df <- data.frame(ID = sample(letters[1:3], 100, replace = TRUE), n = runif(100))

# Mean value of each group
df_mean <- df %>% group_by(ID) %>% summarise(mean = mean(n))

# Plot histograms
ggplot(df) + 
  geom_histogram(aes(n)) + 
  facet_wrap(~ID) +
  geom_point(data = df_mean, aes(x = mean, y = Inf))

enter image description here

I used y = Inf to place the point at the top of each facet, but – as you can see – it is cropped somewhat. I'd like to nudge it downwards so that it is completely visible. To my knowledge, geom_point doesn't have a nudge_y or vadj argument and 0.7 * Inf is obviously nonsensical. I also tried adding position = position_nudge(y = -5) as an argument to geom_point, but this doesn't appear to have any effect. As a workaround, I even tried using geom_text and specifying nudge_y, but – like the position_nudge solution – it did not have any noticeable effect. Is there an easy way of doing this whilst plotting or do I simply need to calculate the y value prior to plotting?

like image 999
Lyngbakr Avatar asked May 23 '18 11:05

Lyngbakr


People also ask

What is the difference between Facet_wrap and Facet_grid?

The facet_grid() function will produce a grid of plots for each combination of variables that you specify, even if some plots are empty. The facet_wrap() function will only produce plots for the combinations of variables that have values, which means it won't produce any empty plots.

Can you facet wrap by 2 variables?

Note that you can add as many (categorical) variables as you'd like in your facet wrap, however, this will result in a longer loading period for R.

What is Facet_wrap?

facet_wrap() makes a long ribbon of panels (generated by any number of variables) and wraps it into 2d. This is useful if you have a single variable with many levels and want to arrange the plots in a more space efficient manner. You can control how the ribbon is wrapped into a grid with ncol , nrow , as.

What is a facet plot?

Facet plots, also known as trellis plots or small multiples, are figures made up of multiple subplots which have the same set of axes, where each subplot shows a subset of the data.


2 Answers

If you are ok with using geom_text/label() you can use the vjust argument to do this:

ggplot(df) + 
    geom_histogram(aes(n)) + 
    facet_wrap(~ID) +
    geom_text(data = df_mean, aes(x = mean, y = Inf),
              label = "Mean", vjust = 1)

enter image description here

I use it all the time for floating percent change or p-values at the top of a panel and you don't have to calculate anything, ggplot has got you.

like image 79
Nate Avatar answered Oct 30 '22 13:10

Nate


# Load libraries 
library(tidyverse)

# Toy data frame
df <- data.frame(ID = sample(letters[1:3], 100, replace = TRUE), n = runif(100))

# Mean value of each group
df_mean <- df %>% group_by(ID) %>% summarise(mean = mean(n))

# Get max count using the dataframe that stores ggplot info
ggplot(df) + 
  geom_histogram(aes(n)) + 
  facet_wrap(~ID) -> p

# Plot histograms and plot mean in the right place
p + geom_point(data = df_mean, aes(x = mean, y = max(ggplot_build(p)$data[[1]]$count)))

enter image description here

The key here is to know the maximum count value, because that will be your top y axis value for your histograms. You can get that info using ggplot_build function and use that to plot your points in the right place.

Of course, you can go a bit higher than the max count in case the point falls on one of the bars, like this y = 0.2 + max(ggplot_build(p)$data[[1]]$count))

like image 4
AntoniosK Avatar answered Oct 30 '22 14:10

AntoniosK