Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct usage of scale_fill_manual() to create multi-colored histogram bars in ggplot2?

Tags:

r

ggplot2

I have a series of data files that I’d like to explore in R, each of which I plan to use to generate a data frame with a column variable that I’ll label, for purposes of this question, as foo. The range of foo lies along the interval [0, 7000]. As part of my data exploration exercise, I’d like to create a 1D histogram of foo, but with a bit of a twist: values of foo in the range between (1000, 7000] are particularly “interesting” to me, and therefore I’d like to color code individual histogram bars in that data range using a palette of colors (i.e., because later on I eventually intend to reuse the same palette to map data from some other columns which I have temporarily omitted from the data frame in order to keep my question from becoming needlessly over complicated). Conversely, values of foo in the range [0,1000] are not as interesting to me, however I’d still like to be able to see them in the histogram, albeit colored gray, in instances where there are any values present.

In my code sample below, I generated an artificial sample data frame, and attempted to plot the histogram using ggplot2, selecting fill colors using scale_fill_manual(). I did get a multi-colored histogram, however it does not look as expected: ggplot2 appears to have ignored my instructions for where to place the breaks between colors. Specifically, the problem seems to be related to missing data: intervals which happen to have no data do not appear to get mapped onto a color, although it was my intent that they should be. This also means that the color gray ends up getting mapped onto the interval (1000, 1500], instead of [0, 1000] as I had intended.

My question: how can I force ggplot2 to assign specific color fill codes to specific data ranges, even if some intervals are blank and have no data, and histogram bars corresponding to those intervals therefore aren't generated?

I’ve included an initial version of my code below, together with a dummy example data frame plus a hand-annotated version of the output that it produces.

library(ggplot2)

# Minimum and maximum values of interest (for other data sets, additional
# values that are of lesser interest may fall within the interval [0, 1000])
lolim<-1000
hilim<-7000
bwdth<-500
# Construct sample data frame
df<-data.frame(foo=c(1200, 1300, 1750, 2200, 2300, 2750, 3200, 3300, 3750,
                     4200, 4300, 4750, 6200, 6300, 6750))
# Construct a discrete factor variable which can later be mapped onto
# discrete color codes
df$colcode<-cut(df$foo, breaks=c(0, seq(lolim, hilim, bwdth)),
                include.lowest=TRUE)

# Create the breaks and color codes to be used by scale_fill_manual()
brk<-levels(df$colcode)
ncol<-length(brk)
# My expectation is that "#808080FF" (gray) will map onto the range
# [0, 1000], while a palette consisting of 12 sequential shades of the
# rainbow will be mapped onto the range (1000, 7000], in intervals of 500
colors<-c("#808080FF", rainbow(ncol-1))

# Draw the histogram
print(ggplot(df, aes(foo)) +
        geom_histogram(aes(fill=colcode), binwidth=bwdth) +
        scale_fill_manual("", breaks=brk, values=colors))

Hand-annotated sample output

like image 681
stachyra Avatar asked Mar 17 '14 22:03

stachyra


1 Answers

You can set the drop argument to FALSE. See ?discrete_scale: drop unused factor levels from the scale (TRUE or FALSE)

ggplot(df, aes(foo)) +
  geom_histogram(aes(fill = colcode), binwidth = bwdth) +
  scale_fill_manual("", breaks = brk, values = colors, drop = FALSE)

enter image description here

like image 161
Henrik Avatar answered Sep 24 '22 01:09

Henrik