Adding a density line to a histogram with count data in ggplot2

Tags:

I want to add a density line (a normal density actually) to a histogram.

Suppose I have the following data. I can plot the histogram by ggplot2:

set.seed(123)    
df <- data.frame(x = rbeta(10000, shape1 = 2, shape2 = 4))

ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", 
                                        binwidth = 0.01)

enter image description here

I can add a density line using:

ggplot(df, aes(x = x)) + 
  geom_histogram(aes(y = ..density..),colour = "black", fill = "white", 
                 binwidth = 0.01) + 
  stat_function(fun = dnorm, args = list(mean = mean(df$x), sd = sd(df$x)))

enter image description here

But this is not what I actually want, I want this density line to be fitted to the count data.

I found a similar post (HERE) that offered a solution to this problem. But it did not work in my case. I need to an arbitrary expansion factor to get what I want. And this is not generalizable at all:

ef <- 100 # Expansion factor

ggplot(df, aes(x = x)) + 
  geom_histogram(colour = "black", fill = "white", binwidth = 0.01) + 
  stat_function(fun = function(x, mean, sd, n){ 
    n * dnorm(x = x, mean = mean, sd = sd)}, 
    args = list(mean = mean(df$x), sd = sd(df$x), n = ef))

enter image description here

Any clues that I can use to generalize this

first to normal distribution,
then to any other bin size,
and lastly to any other distribution will be very helpful.

948

asked Dec 26 '14 20:12

HBat

1 Answers

Fitting a distribution function does not happen by magic. You have to do it explicitly. One way is using fitdistr(...) in the MASS package.

library(MASS)    # for fitsidtr(...)
# excellent fit (of course...)
ggplot(df, aes(x = x)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
  stat_function(fun=dbeta,args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)

# horrible fit - no surprise here
ggplot(df, aes(x = x)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
  stat_function(fun=dnorm,args=fitdistr(df$x,"normal")$estimate)

# mediocre fit - also not surprising...
ggplot(df, aes(x = x)) + 
  geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
  stat_function(fun=dgamma,args=fitdistr(df$x,"gamma")$estimate)

EDIT: Response to OP's comment.

The scale factor is binwidth ✕ sample size.

ggplot(df, aes(x = x)) + 
  geom_histogram(colour = "black", fill = "white", binwidth = 0.01)+
  stat_function(fun=function(x,shape1,shape2)0.01*nrow(df)*dbeta(x,shape1,shape2),
                args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)

129

answered Sep 19 '22 15:09

jlhoward

Related questions
                            
                                save multiple plots in R as a .jpg file, how?
                            
                                Binary R heatmap still displays gradient
                            
                                Adding confidence intervals to a qq plot?
                            
                                Solving non-square linear system with R
                            
                                Count the number of Fridays or Mondays in Month in R
                            
                                Plot a line graph, error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ
                            
                                ggplot2 in shiny error: ggplot2 doesn't know how to deal with data of class packageIQR
                            
                                increase precision in Rcpp floating-point output
                            
                                Importing a text file into R
                            
                                Conditional rolling mean (moving average) on irregular time series
                            
                                How to get rid of whitespace in a ggplot2 plot?
                            
                                How to get only certain plots when plot() returns multiple plots
                            
                                Error when building R package using roxygen2
                            
                                order a dataframe by column in Rcpp
                            
                                How to concatenate numeric columns in R?
                            
                                passing data frame to mutate within function
                            
                                How to exit a sourced R script
                            
                                Scraping javascript website in R
                            
                                Creating "word" cloud of phrases, not individual words in R
                            
                                Annotate first month with year in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Adding a density line to a histogram with count data in ggplot2

Tags:

r

ggplot2

histogram

density-plot

HBat

People also ask

1 Answers

jlhoward

Recent Activity

Donate For Us