I have a simple problem. How to plot histogram with ggplot2
with fixed binwidth
and filled with rainbow colors (or any other palette)?
Lets say I have a data like that:
myData <- abs(rnorm(1000))
I want to plot histogram, using e.g. binwidth=.1
. That however will cause different number of bins, depending on data:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
If I knew number of bins (e.g. n=15
) I'd use something like:
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill=rainbow(n))
But with changing number of bins I'm kind of stuck on this simple problem.
If you really want the number of bins flexible, here is my little workaround:
library(ggplot2)
gg_b <- ggplot_build(
ggplot() + geom_histogram(aes(x = myData), binwidth=.1)
)
nu_bins <- dim(gg_b$data[[1]])[1]
ggplot() + geom_histogram(aes(x = myData), binwidth=.1, fill = rainbow(nu_bins))
In case the binwidth is fixed, here is an alternative solution which is using the internal function ggplot2:::bin_breaks_width()
to get the number of bins before creating the graph. It's still a workaround but avoids to call geom_histogram()
twice as in the other solution:
# create sample data
set.seed(1L)
myData <- abs(rnorm(1000))
binwidth <- 0.1
# create plot
library(ggplot2) # CRAN version 2.2.1 used
n_bins <- length(ggplot2:::bin_breaks_width(range(myData), width = binwidth)$breaks) - 1L
ggplot() + geom_histogram(aes(x = myData), binwidth = binwidth, fill = rainbow(n_bins))
As a third alternative, the aggregation can be done outside of ggplot2
. Then, geom_col()
cam be used instead of geom_histogram()
:
# start binning on multiple of binwidth
start_bin <- binwidth * floor(min(myData) / binwidth)
# compute breaks and bin the data
breaks <- seq(start_bin, max(myData) + binwidth, by = binwidth)
myData2 <- cut(sort(myData), breaks = breaks, by = binwidth)
ggplot() + geom_col(aes(x = head(breaks, -1L),
y = as.integer(table(myData2)),
fill = levels(myData2))) +
ylab("count") + xlab("myData")
Note that breaks
is plotted on the x-axis instead of levels(myData2)
to keep the x-axis continuous. Otherwise each factor label would be plotted which would clutter the x-axis. Also note that the built-in ggplot2
color palette is used instead of rainbow()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With