I have a pre-binned frequency table for a rather large dataset. That is, a single column vector of bins and a single column vector of counts associated with those bins. I'd like R to plot a histogram of this data by doing further binning and summing the existing counts. For example, if in the pre-binned data I have something like [(0.01, 5000), (0.02, 231), (0.03, 948)], where the first number is the bin and the second is the count, and I choose 0.04 as the new bin width, I'd expect to get [(0.04, 6179)]. What's the fastest and or easiest way to do this in R?
Looks like ggplot2 has the answer.
library(ggplot2)
qplot(bin, data=cbind(bins,counts), weight=counts, geom="histogram")
The new HistogramTools package on CRAN has a number of useful functions for doing exactly this. In your example, if you want to merge three adjacent buckets together at each point in the histogram to produce a new histogram with 1/3rd as many buckets, you could use the MergeBuckets
function.
install.packages("HistogramTools")
library(HistogramTools)
h <- hist(rexp(1000), breaks=60)
plot(MergeBuckets(h, adj.buckets=3))
Alternatively, you can also specify a list of the new breakpoints you want explicitly, rather than telling MergeBuckets() to always merge the same number of adjacent buckets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With