Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot scale transformation acts differently on points and functions

I'm trying to plot a distribution CDF using R and ggplot2. However, I am finding difficulties in plotting the CDF function after I transform the Y axis to obtain a straight line. This kind of plot is frequently used in Gumbel paper plots, but here I'll use as example the normal distribution.

I generate the data, and plot the cumulative density function of the data along with the function. They fit well. However, when I apply an Y axis transformation, they don't fit anymore.

sim <- rnorm(100) #Simulate some data
sim <- sort(sim)  #Sort it

cdf <- seq(0,1,length.out=length(sim)) #Compute data CDF

df <- data.frame(x=sim, y=cdf) #Build data.frame

library(scales)
library(ggplot2)

#Now plot!
gg <- ggplot(df, aes(x=x, y=y)) +
        geom_point() +
        stat_function(fun = pnorm, colour="red")
gg

And the output should be something on the lines of: enter image description here Good!

Now I try to transform the Y axis according to the distribution used.

#Apply transformation
gg + scale_y_continuous(trans=probability_trans("norm"))

And the result is: enter image description here

The points are transformed correctly (they lie on a straight line), but the function is not!

However, everything seems to work fine if I do like this, calculating the CDF with ggplot:

ggplot(data.frame(x=sim), aes(x=x)) +
  stat_ecdf(geom = "point") +
  stat_function(fun="pnorm", colour="red") +
  scale_y_continuous(trans=probability_trans("norm"))

The result is OK: This wokrs OK

Why is this happening? Why doesn't calculating the CDF manually work with scale transformations?

like image 848
AF7 Avatar asked May 15 '16 09:05

AF7


People also ask

Are scales part of ggplot2?

The scales packages provides the internal scaling infrastructure used by ggplot2, and gives you tools to override the default breaks, labels, transformations and palettes.

How do I change the Y axis to log scales ggplot2?

This can be done easily using the ggplot2 functions scale_x_continuous() and scale_y_continuous(), which make it possible to set log2 or log10 axis scale. An other possibility is the function scale_x_log10() and scale_y_log10(), which transform, respectively, the x and y axis scales into a log scale: base 10.

What are scales in ggplot2?

Scales in ggplot2 control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape. They also provide the tools that let you interpret the plot: the axes and legends.


1 Answers

This works:

gg <- ggplot(df, aes(x=x, y=y)) +
  geom_point() +
  stat_function(fun ="pnorm", colour="red", inherit.aes = FALSE) +
  scale_y_continuous(trans=probability_trans("norm"))
gg

enter image description here

Possible explanation:

Documentation States: inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.

My guess: As scale_y_continuous changes the aesthetics of the main plot, we need to turn off the default inherit.aes=TRUE. It seems inherit.aes=TRUE in stat_function picks its aesthetics from the first layer of the plot, and so the scale transformation does not impact unless specifically chosen to.

like image 156
Divi Avatar answered Sep 17 '22 21:09

Divi