Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using stat_function to draw partially shaded normal curve in ggplot2

I'm a big beginner in R and am very confused as to how ggplot is using variable "x" when creating normal curves.

My situation is this. I'm trying to plot normal curves given specific means and standard deviations and in the absence of data the most common way I've seen to do this is as follows:

score = 1800
m = 1500
std = 300

ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + stat_function(fun = 
     dnorm, args = list(mean = m, sd = std)) + scale_x_continuous(name 
     = "Score", breaks = seq(300, 2700, std))

I wanted to shade a specific area of the curve so using the Internet I created a function as follows:

funcShaded <- function(x) {
    y = dnorm(x, mean = m, sd = std)
    y[x < score] <- NA
    return(y)
}

And then added a layer to my curve with p + stat_function(fun = funcShaded, geom="area", fill="#84CA72", alpha=.2)

This works to create the graph I desire. However, I have 2 questions about this. First, when I break the code down

data.frame(x = c(300, 2700))

creates a two item dataframe as you would expect so how is this capable to being used to create x-axis values and, further, to be passed to the function to be used appropriately (read. as if it were a list of values)?

Second, I now want to re-use this function later to fill in other area under the curve based on a different score (e.g. score2 = 1630) and was thinking I could just add another variable to funcShaded to pass score (i.e. funcShaded <- function(x, score)) and then call my stat_function as follows: p + stat_function(fun = funcShaded(x, score2), ...) but:

  1. I'm not sure this syntax will work
  2. It seems like the x variable is never explicitly "created" with this code because it doesn't show up in my Environment and when I try this code I get Error: object 'x' not found

So I guess I'm just curious as to how 'x' is working in this situation and if I should be creating it differently given what I want to do.

like image 946
chainhomelow Avatar asked Feb 12 '18 18:02

chainhomelow


People also ask

How to draw normal distribution curve in r ggplot?

In order to create a normal curve, we create a ggplot base layer that has an x-axis range from -4 to 4 (or whatever range you want!), and assign the x-value aesthetic to this range ( aes(x = x) ). We then add the stat_function option and add dnorm to the function argument to make it a normal curve.

How do you shade normal distribution in R?

The easiest-to-find method for shading under a normal density is to use the polygon() command. That link is to the first hit on Google for “Shading Under a Normal Curve in R.” It works (like a charm), but it is not the most intuitive way to let users produce plots of normal densities.


1 Answers

The function passed to stat_function must be uncalled (unless it returns another function; an adverb like purrr::partial or the like is another approach here), because stat_function needs to pass it a vector of x values.

You've already done with dnorm what you need to do with funcShaded: pass additional fixed parameters through args:

library(ggplot2)

score = 1800
m = 1500
std = 300

funcShaded <- function(x, lower_bound) {
    y = dnorm(x, mean = m, sd = std)
    y[x < lower_bound] <- NA
    return(y)
}

ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = std)) + 
    stat_function(fun = funcShaded, args = list(lower_bound = score), 
                  geom = "area", fill = "#84CA72", alpha = .2) +
    scale_x_continuous(name = "Score", breaks = seq(300, 2700, std))

Alternately, without writing your own function, you can do the same thing with stat_function's xlim parameter:

ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = std)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = std), xlim = c(score, 2700),
                  geom = "area", fill = "#84CA72", alpha = .2) +
    scale_x_continuous(name = "Score", breaks = seq(300, 2700, std))

As for how stat_function uses the values passed into its x aesthetic, it uses them as limits between which to interpolate a grid of values, the number of which set by its n parameter, which defaults to 101. It's decidedly a different usage than most stats, but it's a very useful function.

like image 189
alistaire Avatar answered Nov 14 '22 23:11

alistaire