Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using ggplot2, can I set the color of histogram bars without potentially obscuring low values?

Tags:

r

ggplot2

When calling geom_histogram() with the color, and fill arguments, ggplot2 will confusingly paint the whole x-axis range, making it impossible to visually distinguish between a low value and a zero value.

Running the following code:

ggplot(esubset, aes(x=exectime)) + geom_histogram(binwidth = 0.5) +
theme_bw() + scale_x_continuous(breaks=seq(0,20), limits=c(0,20))

will result in

a histogram w/o color attributes

This is visually very unappealing. To fix that, I'd like to instead use

ggplot(esubset, aes(x=exectime)) + geom_histogram(binwidth = 0.5,
colour='black', fill='gray') + theme_bw() +
scale_x_continuous(breaks=seq(0,20), limits=c(0,20))

which would result in

a histogram with color attributes

The problem is that I'll have no way of distinguishing whether exectime contains values past 10, as a few occurrences of 12, for example, would be hidden behind the horizontal line spanning the whole x-axis.

like image 492
yuppity Avatar asked Jun 10 '16 23:06

yuppity


1 Answers

Use coord_cartesian instead of scale_x_continuous. coord_cartesian sets the axis range without affecting how the data are plotted. Even with coord_cartesian, you can still use scale_x_continuous to set the breaks, but coord_cartesian will override any effect of scale_x_continuous on how the data are plotted.

In the fake data below, note that I've added data for a few very small bars.

set.seed(4958)
dat = data.frame(value=c(rnorm(5000, 10, 1), rep(15:20,1:6)))

ggplot(dat, aes(value)) +
  geom_histogram(binwidth=0.5, color="black", fill="grey") + 
  theme_bw() +
  scale_x_continuous(limits=c(5,25), breaks=5:25) + 
  ggtitle("scale_x_continuous")

ggplot(dat, aes(value)) +
  geom_histogram(binwidth=0.5, color="black", fill="grey") + 
  theme_bw() +
  coord_cartesian(xlim=c(5,25)) + 
  scale_x_continuous(breaks=5:25) +
  ggtitle("coord_cartesian")

enter image description here

As you can see in the plots above, if there are bins with count=0 within the data range, ggplot will add a zero-line, even with coord_cartesian. This makes it difficult to see the bar at 15 of height=1. You can make the border thinner with the lwd argument ("linewidth") so that smaller bars will be less obscured:

ggplot(dat, aes(value)) +
  geom_histogram(binwidth=0.5, color="black", fill="grey", lwd=0.3) + 
  theme_bw() +
  coord_cartesian(xlim=c(5,25)) + 
  scale_x_continuous(breaks=5:25) +
  ggtitle("coord_cartesian")

enter image description here

One other option is to pre-summarise the data and plot using geom_bar in order to get spaces between the bars and thereby avoid the need for border lines to mark bar edges:

library(dplyr)
library(tidyr)
library(zoo)

bins = seq(floor(min(dat$value)) - 1.75, ceiling(max(dat$value)) + 1.25, 0.5)

dat.binned = dat %>% 
  count(bin=cut(value, bins, right=FALSE)) %>%   # Bin the data
  complete(bin, fill=list(n=0)) %>%              # Restore empty bins and fill with zeros
  mutate(bin = rollmean(bins,2)[-length(bins)])  # Convert bin from factor to numeric with value = mean of bin range

ggplot(dat.binned, aes(bin, n)) +
  geom_bar(stat="identity", fill=hcl(240,100,30)) + 
  theme_bw() +
  scale_x_continuous(breaks=0:21)

enter image description here

like image 155
eipi10 Avatar answered Oct 07 '22 16:10

eipi10