Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add mean, and mode to ggplot histogram?

I need to add a mean line and the value of the mode for example to this kinds of plots:

I use this for calculate the number of bins:

bw <- diff(range(cars$lenght)) / (2 * IQR(cars$lenght) / length(cars$lenght)^(1/3))

And the plot:

ggplot(data=cars, aes(cars$lenght)) + 
  geom_histogram(aes(y =..density..), 
                 col="red",
                 binwidth = bw,
                 fill="green", 
                 alpha=1) + 
  geom_density(col=4) + 
  labs(title='Lenght Plot', x='Lenght', y='Times')

cars$lenght

168.8 168.8 171.2 176.6 176.6 177.3 192.7 192.7 192.7 178.2 176.8 176.8 176.8 176.8 189.0 189.0 193.8 197.0 141.1 155.9 158.8 157.3 157.3 157.3 157.3 157.3 157.3 157.3 174.6 173.2

Thanks in advance.

like image 793
Borja_042 Avatar asked Oct 29 '17 13:10

Borja_042


People also ask

How do you show the mean on a Ggplot histogram?

To display mean in a histogram using ggplot2, we can use geom_vline function where we need to define the x-intercept value as the mean of the column for which we want to create the histogram. Also, we can change the size of the line for mean in the histogram by using size argument inside geom_vline function.

Can you build a histogram using ggplot2?

You can also make histograms by using ggplot2 , “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.


1 Answers

I'm not sure how to replicate your data, so I used cars$speed in its place.

geom_vline will place vertical lines where you want, and you can calculate the mean and mode of the raw data on the fly. But if you want the mode as the histogram bin with the highest frequency, you can extract that from the ggplot object.

I'm not too sure how you want to define mode, so i plotted a bunch of different approaches.

# function to calculate mode
fun.mode<-function(x){as.numeric(names(sort(-table(x)))[1])}

bw <- diff(range(cars$length)) / (2 * IQR(cars$speed) / length(cars$speed)^(1/3))
p<-ggplot(data=cars, aes(cars$speed)) + 
  geom_histogram(aes(y =..density..), 
                 col="red",
                 binwidth = bw,
                 fill="green", 
                 alpha=1) + 
  geom_density(col=4) + 
  labs(title='Lenght Plot', x='Lenght', y='Times')

# Extract data for the histogram and density peaks
data<-ggplot_build(p)$data
hist_peak<-data[[1]]%>%filter(y==max(y))%>%.$x
dens_peak<-data[[2]]%>%filter(y==max(y))%>%.$x

# plot mean, mode, histogram peak and density peak
p%+%
  geom_vline(aes(xintercept = mean(speed)),col='red',size=2)+
  geom_vline(aes(xintercept = fun.mode(speed)),col='blue',size=2)+
  geom_vline(aes(xintercept = hist_peak),col='orange',size=2)+
  geom_vline(aes(xintercept = dens_peak),col='purple',size=2)+
  geom_text(aes(label=round(hist_peak,1),y=0,x=hist_peak),
            vjust=-1,col='orange',size=5)

enter image description here

like image 178
dule arnaux Avatar answered Oct 19 '22 18:10

dule arnaux