Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to label histogram bars with data values or percents in R

I'd like to label each bar of a histogram with either the number of counts in that bin or the percent of total counts that are in that bin. I'm sure there must be a way to do this, but I haven't been able to find it. This page has a couple of pictures of SAS histograms that do basically what I'm trying to do (but the site doesn't seem to have R versions): http://www.ats.ucla.edu/stat/sas/faq/histogram_anno.htm

If possible, it would also be nice to have the flexibility to put the labels above or somewhere inside the bars, as desired.

I'm trying to do this with the base R plotting facilities, but I'd be interested in methods to do this in ggplot2 and lattice as well.

like image 321
eipi10 Avatar asked Feb 16 '12 19:02

eipi10


People also ask

How do you show percentages on a histogram in R?

Use hist() function in R to get percentages as opposed to raw frequencies.

How do you add a label to a histogram?

To give labels use set_xlabel() and set_ylabel() functions. We add label to each bar in histogram and for that, we loop over each bar and use text() function to add text over it. We also calculate height and width of each bar so that our label don't coincide with each other.

What is the R code for histogram?

Histogram can be created using the hist() function in R programming language. This function takes in a vector of values for which the histogram is plotted.


2 Answers

To include the number of counts, you can just set labels=TRUE.

The example below is just slightly adapted from one on the hist() help page:

hist(islands, col="gray", labels = TRUE, ylim=c(0, 45))

enter image description here

Getting percentages is a bit more involved. The only way I know to do that it to directly manipulate the object returned by a call to hist(), as described in a bit more detail in my answer to this similar question:

histPercent <- function(x, ...) {
   H <- hist(x, plot = FALSE)
   H$density <- with(H, 100 * density* diff(breaks)[1])
   labs <- paste(round(H$density), "%", sep="")
   plot(H, freq = FALSE, labels = labs, ylim=c(0, 1.08*max(H$density)),...)
}

histPercent(islands, col="gray")

enter image description here

like image 65
Josh O'Brien Avatar answered Oct 19 '22 04:10

Josh O'Brien


Adding numbers at the tops of the bars in barplots or histograms distorts the visual interpretation of the bars, even putting the labels inside of the bars near the top creates a fuzzy top effect that makes it harder for the viewer to properly interpret the graph. If the number are of interest then this creates a poorly laid out table, why not just create a proper table.

If you really feel the need to add the numbers then it is better to put them below the bars or along the top margin so that they line up better for easier comparison and don't interfere with the visual interpretation of the graph. Labels can be added to base graphs using the text or mtext functions and the x locations can be found in the return value from the hist function. Heights for plotting can be computed using the grconvertY function.

like image 6
Greg Snow Avatar answered Oct 19 '22 06:10

Greg Snow