Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot scale transformation inaccurate for stat_function

Tags:

r

ggplot2

I have some right-skewed data, and I'd like to visually compare distribution fits to the data on the regular scale and the log-scale using ggplot. However, when I use the scale_x_continuous() or scale_x_log10() to convert the distribution curves, the transformation does not translate correctly.

x <- rlnorm(1000, meanlog = -4, sdlog = 1)
ggplot(data.frame(x)) +
  geom_histogram(aes(x, y = ..density.. * 25)) +
  scale_x_log10() +
  stat_function(fun = "dlnorm",
                args = list(meanlog = -4,
                            sdlog = 1))

histogram on log scale

Notice how the mean of the lognormal curve does not match the mean of the histogram. Why not? Is there a way to get them to match?

In a different but related post, a suggested answer was to include the argument inherit.aes = FALSE, but that does not help here.

I am using R version 3.4.3 and ggplot2 version 2.2.1.

like image 871
K Bro Avatar asked Apr 08 '26 03:04

K Bro


1 Answers

First, when working with the log normal distribution recall that the default is to work with the natural logarithm, not the base 10 logarithm. Part of the issue with the graphic above is due to the mixing of logarithm bases.

Let's work through this by first generating the example observations of a log normal random variable X with meanlog -4 and sdlog 1, that is,

f1

library(ggplot2)
library(gridExtra)

set.seed(42)

dat <- data.frame(x = rlnorm(1000, meanlog = -4, sdlog = 1))

We will start by plotting the density on the standard x-axis. I'll use geom_histogram with stat = "density" so that the bars are scaled and there is no need to use the aesthetic y = ..density.. This is very similar to your original plot, just no attempt to scale the x axis.

ggplot(dat) +
  geom_histogram(mapping = aes(x = x), stat = "density")  +
  stat_function(fun = "dlnorm",
                args = list(meanlog = -4, sdlog = 1),
                n = 501,
                color = "red")

enter image description here

Now, recall that if

f1

then

f2

where the log is the natural logarithm.

One way to plot the generated data example on the log scale is as follows. Note that the log transform is explicit in the mapping for geom_historgram and that the stat_function is using dnorm not dlnorm.

ggplot(dat) +
  geom_histogram(mapping = aes(x = log(x)), stat = "density")  +
  stat_function(fun = "dnorm",
                args = list(mean = -4, sd = 1),
                n = 501,
                color = "red")

enter image description here

Now, to transform the x-axis you will want to use ggplot2::scale_x_continuous with the trans = "log" argument. When this transform is applied to the graphic, the scale of the x-axis is modified and the evaluation of the stat_function will occur on the transformed x values, not the original values. Thus, you'll need to define the function to use dnorm(log(x)) as shown below:

ggplot(dat) +
  geom_histogram(mapping = aes(x = x), stat = "density")  +
  stat_function(fun = function(x, ...) {dnorm(log(x), ...) },
                args = list(mean = -4, sd = 1),
                n = 501,
                color = "red") +
  scale_x_continuous(trans = "log",
                     breaks = exp(seq(-6, 0, by = 2)),
                     labels = paste("exp(", seq(-6, 0, by = 2), ")"))

enter image description here

It is worth noting that that labels for the x-axis ticks in the second plot are integer values and the x-axis label is log(x) whereas in the third graphic the x-axis ticks are expressions and the label is plan "x." Make sure you are using descriptive tick and axis labels.

like image 140
Peter Avatar answered Apr 19 '26 20:04

Peter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!