I have some right-skewed data, and I'd like to visually compare distribution fits to the data on the regular scale and the log-scale using ggplot. However, when I use the scale_x_continuous() or scale_x_log10() to convert the distribution curves, the transformation does not translate correctly.
x <- rlnorm(1000, meanlog = -4, sdlog = 1)
ggplot(data.frame(x)) +
geom_histogram(aes(x, y = ..density.. * 25)) +
scale_x_log10() +
stat_function(fun = "dlnorm",
args = list(meanlog = -4,
sdlog = 1))

Notice how the mean of the lognormal curve does not match the mean of the histogram. Why not? Is there a way to get them to match?
In a different but related post, a suggested answer was to include the argument inherit.aes = FALSE, but that does not help here.
I am using R version 3.4.3 and ggplot2 version 2.2.1.
First, when working with the log normal distribution recall that the default is to work with the natural logarithm, not the base 10 logarithm. Part of the issue with the graphic above is due to the mixing of logarithm bases.
Let's work through this by first generating the example observations of a log normal random variable with meanlog -4 and sdlog 1, that is,
library(ggplot2)
library(gridExtra)
set.seed(42)
dat <- data.frame(x = rlnorm(1000, meanlog = -4, sdlog = 1))
We will start by plotting the density on the standard x-axis. I'll use geom_histogram with stat = "density" so that the bars are scaled and there is no need to use the aesthetic y = ..density.. This is very similar to your original plot, just no attempt to scale the x axis.
ggplot(dat) +
geom_histogram(mapping = aes(x = x), stat = "density") +
stat_function(fun = "dlnorm",
args = list(meanlog = -4, sdlog = 1),
n = 501,
color = "red")

Now, recall that if
then
where the log is the natural logarithm.
One way to plot the generated data example on the log scale is as follows. Note that the log transform is explicit in the mapping for geom_historgram and that the stat_function is using dnorm not dlnorm.
ggplot(dat) +
geom_histogram(mapping = aes(x = log(x)), stat = "density") +
stat_function(fun = "dnorm",
args = list(mean = -4, sd = 1),
n = 501,
color = "red")

Now, to transform the x-axis you will want to use ggplot2::scale_x_continuous with the trans = "log" argument. When this transform is applied to the graphic, the scale of the x-axis is modified and the evaluation of the stat_function will occur on the transformed x values, not the original values. Thus, you'll need to define the function to use dnorm(log(x)) as shown below:
ggplot(dat) +
geom_histogram(mapping = aes(x = x), stat = "density") +
stat_function(fun = function(x, ...) {dnorm(log(x), ...) },
args = list(mean = -4, sd = 1),
n = 501,
color = "red") +
scale_x_continuous(trans = "log",
breaks = exp(seq(-6, 0, by = 2)),
labels = paste("exp(", seq(-6, 0, by = 2), ")"))

It is worth noting that that labels for the x-axis ticks in the second plot are integer values and the x-axis label is log(x) whereas in the third graphic the x-axis ticks are expressions and the label is plan "x." Make sure you are using descriptive tick and axis labels.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With