After constructing a histogram I'd like to add an upper boundary/outline to my plot. I don't want to use geom_bar
or geom_col
because I don't want the vertical boundaries for each bin.
My attempts have included using geom_histogram
and stat_bin(geom = "bin")
, however the bins don't align.
I've adjusted parameters within each geom (bins
, binwidth
, center
, boundary
) and haven't been able to align these distributions. There have been similar questions on SO (Overlaying geom_points on a geom_histogram or stat_bin) but none seem to have a similar problem to mine or offer a solution.
Here is a case where my geom layers don't align:
set.seed(2019)
library(ggplot2)
library(ggthemes)
df <- data.frame(x = rnorm(100),
y = rep(c("a", "b"), 50))
p <- df %>%
ggplot(aes(x, fill = y)) +
geom_histogram() +
facet_wrap(vars(y)) +
theme_fivethirtyeight() +
guides(fill = F)
This is plot p
, my base histogram:
p + stat_bin(geom = "step")
I desire a plot where these two geoms align. I've tested a variety of dummy data and this continues to be an issue. Why don't these geoms naturally align? How can I adjust either of these layers to align? Is there a better alternative than combining histogram and stat bin to achieve my desired plot?
The bars don't naturally align, because geom_step appears to be using the middle of each histogram bar (the x
column in the data frame returned by layer_data(p)
) as the location for each change point. Thus, to align the steps, use position_nudge to move geom_step by half the binwidth:
library(tidyverse)
p <- df %>%
ggplot(aes(x, fill = y)) +
geom_histogram(bins=20) +
facet_wrap(vars(y)) +
theme_fivethirtyeight() +
guides(fill = F)
binwidth = layer_data(p) %>% mutate(w=xmax-xmin) %>% pull(w) %>% median
p + stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth))
Note, however, in the plot above that the step border stops in the middle of the last bar in the left panel, and doesn't bound the left edge of first bar in the right panel. Below is a hack to get geom_step
to completely bound all the histgram bars.
We add two rows of fake data outside the range of the real data, then we set the x-range of the plot to include only the range of the real data. In this case, I've set the binwidth
(rather than the number of bins) because extending the data range will increase the binwidth for any fixed number of bins, and also added a center
argument, which isn't necessary, but can be used to ensure that the bins are centered at particular locations, if desired.
If this is something you want to do often, you can turn this into a function with some logic to automate expanding of the data frame with fake data and setting the bins and the x-range of the plot appropriately.
p <- df %>%
add_row(x=range(df$x) + c(-1,1), y="a") %>%
ggplot(aes(x, fill = y)) +
geom_histogram(binwidth=0.2, center=0) +
facet_wrap(vars(y)) +
theme_fivethirtyeight() +
guides(fill = F)
binwidth = layer_data(p) %>% mutate(xmax-xmin) %>% pull() %>% median
p +
stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth)) +
coord_cartesian(xlim=range(df$x[1:(nrow(df)-2)]) + c(-0.2,0.2))
Here's what the same plot looks like without the extra-rows hack:
p <- df %>%
ggplot(aes(x, fill = y)) +
geom_histogram(binwidth=0.2, center=0) +
facet_wrap(vars(y)) +
theme_fivethirtyeight() +
guides(fill = F)
binwidth = layer_data(p) %>% mutate(xmax-xmin) %>% pull() %>% median
p +
stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With