Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot add Normal Distribution while using `facet_wrap` [duplicate]

Tags:

r

ggplot2

I'm looking to plot the following histograms:

library(palmerpenguins)
library(tidyverse)

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram() + 
  facet_wrap(~species)

enter image description here

For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.

Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.

I have tried:

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram() + 
  facet_wrap(~species) + 
  stat_function(fun = dnorm)

But this only gives me a thin line at the bottom:

enter image description here

Any ideas? Thanks!

Edit I guess what I'm trying to recreate is this simple command from Stata:

hist bill_length_mm, by(species) normal

which gives me this: enter image description here

I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R

But I'm specifically looking for a short answer that does not require me creating a separate function.

like image 506
Moritz Schwarz Avatar asked Oct 30 '25 15:10

Moritz Schwarz


2 Answers

A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).

library(palmerpenguins)
library(tidyverse)
library(ggh4x)

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(binwidth = 1) + 
  stat_theodensity(aes(y = after_stat(count))) +
  facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).

You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(binwidth = 2) + 
  stat_theodensity(aes(y = after_stat(count)*2)) +
  facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).

Created on 2021-01-27 by the reprex package (v0.3.0)

If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(aes(y = after_stat(density))) + 
  stat_theodensity() +
  facet_wrap(~species)
like image 101
teunbrand Avatar answered Nov 01 '25 05:11

teunbrand


While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.

penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30) + 
  facet_wrap(~species) + 
  geom_line(aes(y = dnorm(bill_length_mm,
                          mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
                          sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))

enter image description here

like image 36
Ian Campbell Avatar answered Nov 01 '25 03:11

Ian Campbell