I have a series of data files that I’d like to explore in R, each of which I plan to use to generate a data frame with a column variable that I’ll label, for purposes of this question, as foo
. The range of foo
lies along the interval [0, 7000]. As part of my data exploration exercise, I’d like to create a 1D histogram of foo
, but with a bit of a twist: values of foo in the range between (1000, 7000] are particularly “interesting” to me, and therefore I’d like to color code individual histogram bars in that data range using a palette of colors (i.e., because later on I eventually intend to reuse the same palette to map data from some other columns which I have temporarily omitted from the data frame in order to keep my question from becoming needlessly over complicated). Conversely, values of foo
in the range [0,1000] are not as interesting to me, however I’d still like to be able to see them in the histogram, albeit colored gray, in instances where there are any values present.
In my code sample below, I generated an artificial sample data frame, and attempted to plot the histogram using ggplot2
, selecting fill colors using scale_fill_manual()
. I did get a multi-colored histogram, however it does not look as expected: ggplot2
appears to have ignored my instructions for where to place the breaks between colors. Specifically, the problem seems to be related to missing data: intervals which happen to have no data do not appear to get mapped onto a color, although it was my intent that they should be. This also means that the color gray ends up getting mapped onto the interval (1000, 1500], instead of [0, 1000] as I had intended.
My question: how can I force ggplot2
to assign specific color fill codes to specific data ranges, even if some intervals are blank and have no data, and histogram bars corresponding to those intervals therefore aren't generated?
I’ve included an initial version of my code below, together with a dummy example data frame plus a hand-annotated version of the output that it produces.
library(ggplot2)
# Minimum and maximum values of interest (for other data sets, additional
# values that are of lesser interest may fall within the interval [0, 1000])
lolim<-1000
hilim<-7000
bwdth<-500
# Construct sample data frame
df<-data.frame(foo=c(1200, 1300, 1750, 2200, 2300, 2750, 3200, 3300, 3750,
4200, 4300, 4750, 6200, 6300, 6750))
# Construct a discrete factor variable which can later be mapped onto
# discrete color codes
df$colcode<-cut(df$foo, breaks=c(0, seq(lolim, hilim, bwdth)),
include.lowest=TRUE)
# Create the breaks and color codes to be used by scale_fill_manual()
brk<-levels(df$colcode)
ncol<-length(brk)
# My expectation is that "#808080FF" (gray) will map onto the range
# [0, 1000], while a palette consisting of 12 sequential shades of the
# rainbow will be mapped onto the range (1000, 7000], in intervals of 500
colors<-c("#808080FF", rainbow(ncol-1))
# Draw the histogram
print(ggplot(df, aes(foo)) +
geom_histogram(aes(fill=colcode), binwidth=bwdth) +
scale_fill_manual("", breaks=brk, values=colors))
You can set the drop
argument to FALSE
. See ?discrete_scale
: drop unused factor levels from the scale (TRUE or FALSE)
ggplot(df, aes(foo)) +
geom_histogram(aes(fill = colcode), binwidth = bwdth) +
scale_fill_manual("", breaks = brk, values = colors, drop = FALSE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With