I have a ggplot2-based heatmap that renders counts of the occurrences of certain factors. However, different datasets sometimes don't have instances of some factors, which means that their respective heatmaps will look different. To make side-by-side comparison easier I'd like to add in missing levels. Unfortunately I've not been successful.
So, I have data that looks like this:
> head(numRules)
Job Generation NumRules
1 0 0 2
2 0 1 1
3 0 2 1
4 0 3 1
5 0 4 1
6 0 5 1
> levels(factor(numRules$NumRules))
[1] "1" "2" "3"
I use the following code to render a nice heatmap that counts the number of rules per generation for all jobs:
ggplot(subset(numRules, Generation < 21), aes(x=Generation, y=factor(NumRules))) +
stat_bin(aes(fill=..count..), geom="tile", binwidth=1, position="identity") +
ylab('Number of Rules')
Heat map of count of number of rules by generation for all jobs
So the heat map is saying that most of the time the runs only have a single rule for a given generation, but sometimes you get two, and on rare occasions you'll get three.
Now an entirely different set of runs may actually have zero rules for a given generation. However, doing a side-by-side comparison would be a little confusing because the y axis of one heat map has the number of rules in [1,3], and the other might be in [0,2]. What I'd like to do is to standardize the heatmaps so that they all have factor levels in (0,1,2,3) regardless of the number of rules. E.g., I'd like to re-render the above heat map to include a row for zero rules even though there are no instances of that in that particular data frame.
I have battered this with various R incantations involving setting breaks and scales and whatnot to no avail. My intuition is that there is a simple solution to this, yet I'm unable to find it.
Update:
If I manually specify the levels in the call to factor I do get a row added for the zero rules:
ggplot(subset(numRules, Generation < 21), aes(x=Generation, y=factor(NumRules,levels=c("0","1","2","3")))) + stat_bin(aes(fill=..count..), geom="tile", binwidth=1, position="identity") + ylab('Number of Rules')
Which yields this.
Unfortunately, as you can see this new row isn't properly colored. Getting there!
If all the NumRules you're interested in are levels of the factor, then you can fix this by just specifying drop=FALSE in scale_y_discrete():
numRules = read.table(text=" Job Generation NumRules
1 0 0 2
2 0 1 1
3 0 2 1
4 0 3 1
5 0 4 1
6 0 5 1", header=TRUE)
numRules$NumRules = factor(numRules$NumRules, levels=c(1, 2, 3))
ggplot(subset(numRules, Generation < 21), aes(x=Generation, y=NumRules)) +
scale_y_discrete(drop=FALSE) +
stat_bin(aes(fill=..count..), geom="tile", binwidth=1, position="identity") +
ylab('Number of Rules')
Result:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With