Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 find number of counts in histogram maximum

Tags:

r

ggplot2

I have a short R script which plots a few histograms using ggplot2. How can I automatically set the ymax limit in the histogram based on the maximum frequency in the histogram (plus 10%) i.e

scale_y_continuous(limits= c(0,ymax*1.1)

plot = ggplot(data, aes(myo_activity)) +
  geom_histogram(binwidth=0.5, aes(fill=..count..))
plot + scale_x_continuous(expand = c(0,0), limits = c(30,90)) + 
  scale_y_continuous(expand = c(0,0), limits = c(0,140))
like image 886
moadeep Avatar asked Jan 29 '13 13:01

moadeep


2 Answers

For example used data movies as sample data are not provided.

With function ggplot_build() you can get list containing all the elements used for plotting your data. All the data are in list element data[[1]]. Column count of this element contains values for histogram. You can use maximal value of this column to set limits for your plot.

plot = ggplot(movies, aes(rating)) + geom_histogram(binwidth=0.5, aes(fill=..count..))
ggplot_build(plot)$data[[1]]
      fill    y count     x     ndensity       ncount      density PANEL group ymin ymax xmin xmax
1  #132B43    0     0  0.75 0.0000000000 0.0000000000 0.0000000000     1     1    0    0  0.5  1.0
2  #142E48  272   272  1.25 0.0323232323 0.0323232323 0.0092535892     1     1    0  272  1.0  1.5
3  #16314B  454   454  1.75 0.0539512775 0.0539512775 0.0154453290     1     1    0  454  1.5  2.0
4  #17344F  668   668  2.25 0.0793820559 0.0793820559 0.0227257263     1     1    0  668  2.0  2.5
5  #1B3A58 1133  1133  2.75 0.1346405229 0.1346405229 0.0385452813     1     1    0 1133  2.5  3.0

plot + scale_y_continuous(expand = c(0,0),
         limits=c(0,max(ggplot_build(plot)$data[[1]]$count)*1.1))

enter image description here

like image 153
Didzis Elferts Avatar answered Sep 28 '22 18:09

Didzis Elferts


Personally, I find the 'hist' function to be the most useful for these sorts of calculations. The 'hist' function is super fast and can provide your frequency counts. For your case, you could do something like this:

max(hist(data$myo_activity, breaks=seq(range_Min, range_Max, by=bin_Width), plot=FALSE)$counts)

Where range_Min is the bottom of your theoretical range (i.e. 0), and range_Max is the upper limit above your theoretically range. bin_Width is the value width of each frequency count.

The equation should give you the max value you need to specify the plot range. I believe the 'ggplot' function is calling the 'hist' function anyway, so I prefer to call it directly when I'm only wanting the data.

like image 36
Dinre Avatar answered Sep 28 '22 18:09

Dinre