Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`geom_histogram` and `stat_bin()` don't align

Tags:

r

ggplot2

After constructing a histogram I'd like to add an upper boundary/outline to my plot. I don't want to use geom_bar or geom_col because I don't want the vertical boundaries for each bin.
My attempts have included using geom_histogram and stat_bin(geom = "bin"), however the bins don't align.

I've adjusted parameters within each geom (bins, binwidth, center, boundary) and haven't been able to align these distributions. There have been similar questions on SO (Overlaying geom_points on a geom_histogram or stat_bin) but none seem to have a similar problem to mine or offer a solution.

Here is a case where my geom layers don't align:

set.seed(2019)
library(ggplot2)
library(ggthemes)
df <- data.frame(x = rnorm(100), 
                 y = rep(c("a", "b"), 50))

p <- df %>% 
    ggplot(aes(x, fill = y)) + 
    geom_histogram() + 
    facet_wrap(vars(y)) + 
    theme_fivethirtyeight() + 
    guides(fill = F)

This is plot p, my base histogram: enter image description here

p + stat_bin(geom = "step")

enter image description here

I desire a plot where these two geoms align. I've tested a variety of dummy data and this continues to be an issue. Why don't these geoms naturally align? How can I adjust either of these layers to align? Is there a better alternative than combining histogram and stat bin to achieve my desired plot?

like image 231
OTStats Avatar asked Sep 19 '19 15:09

OTStats


1 Answers

The bars don't naturally align, because geom_step appears to be using the middle of each histogram bar (the x column in the data frame returned by layer_data(p)) as the location for each change point. Thus, to align the steps, use position_nudge to move geom_step by half the binwidth:

library(tidyverse)

p <- df %>% 
  ggplot(aes(x, fill = y)) + 
  geom_histogram(bins=20) + 
  facet_wrap(vars(y)) + 
  theme_fivethirtyeight() + 
  guides(fill = F)

binwidth = layer_data(p) %>% mutate(w=xmax-xmin) %>% pull(w) %>% median

p + stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth))

enter image description here

Note, however, in the plot above that the step border stops in the middle of the last bar in the left panel, and doesn't bound the left edge of first bar in the right panel. Below is a hack to get geom_step to completely bound all the histgram bars.

We add two rows of fake data outside the range of the real data, then we set the x-range of the plot to include only the range of the real data. In this case, I've set the binwidth (rather than the number of bins) because extending the data range will increase the binwidth for any fixed number of bins, and also added a center argument, which isn't necessary, but can be used to ensure that the bins are centered at particular locations, if desired.

If this is something you want to do often, you can turn this into a function with some logic to automate expanding of the data frame with fake data and setting the bins and the x-range of the plot appropriately.

p <- df %>% 
  add_row(x=range(df$x) + c(-1,1), y="a") %>% 
  ggplot(aes(x, fill = y)) + 
  geom_histogram(binwidth=0.2, center=0) + 
  facet_wrap(vars(y)) + 
  theme_fivethirtyeight() + 
  guides(fill = F)

binwidth = layer_data(p) %>% mutate(xmax-xmin) %>% pull() %>% median

p + 
  stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth)) +
  coord_cartesian(xlim=range(df$x[1:(nrow(df)-2)]) + c(-0.2,0.2))

enter image description here

Here's what the same plot looks like without the extra-rows hack:

p <- df %>% 
  ggplot(aes(x, fill = y)) + 
  geom_histogram(binwidth=0.2, center=0) + 
  facet_wrap(vars(y)) + 
  theme_fivethirtyeight() + 
  guides(fill = F)

binwidth = layer_data(p) %>% mutate(xmax-xmin) %>% pull() %>% median

p + 
  stat_bin(geom = "step", binwidth=binwidth, position=position_nudge(x=-0.5*binwidth))

enter image description here

like image 157
eipi10 Avatar answered Oct 09 '22 08:10

eipi10