I'm using Canada's census data with variables Wage on the x-axis and the density on the y-axis. I'm trying to overlay the graph I've created with the log-normal distribution dlnorm but I'm not sure what to use as the meanlog and sdlog parameter values. I've tried using mean(data$Wages) and sd(data$Wages), as well as taking the natural logarithm of both, etc. Nothing gives me a graph remotely similar to the density histogram I have generated.
Is this because my data is not log-normal? How can I find the correct meanlog and sdlog parameters?
This is my code:
inc_plot <- data_adults %>%
ggplot(aes(x=Wages)) +
geom_histogram(aes(y=..density..), bins=100,fill="transparent", colour="black")+
scale_x_continuous(labels=scales::comma) +
stat_function(fun = dlnorm,
args = list(meanlog = 48637.91, sdlog = 62459.15),
col = "red")
inc_plot
The current parameters are by using the aforementioned mean() and sd() functions.

If you set meanlog = mean(log(your_data)) and likewise sdlog = sd(log(your_data)) the density should approach the histogram.
library(ggplot2)
df <- data.frame(x = rlnorm(1e4))
ggplot(df, aes(x)) +
geom_histogram(
aes(y = after_stat(density)),
bins = 100, fill = "transparent", colour = "black"
) +
stat_function(
fun = dlnorm,
args = list(meanlog = mean(log(df$x)), sdlog = sd(log(df$x))),
colour = "red"
)

Created on 2021-08-23 by the reprex package (v2.0.1)
An alternative would be to use ggh4x::stat_theodensity(distri = "lnorm", colour = "red"). (disclaimer: I'm the author of ggh4x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With