I'm a big beginner in R and am very confused as to how ggplot is using variable "x" when creating normal curves.
My situation is this. I'm trying to plot normal curves given specific means and standard deviations and in the absence of data the most common way I've seen to do this is as follows:
score = 1800
m = 1500
std = 300
ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + stat_function(fun =
dnorm, args = list(mean = m, sd = std)) + scale_x_continuous(name
= "Score", breaks = seq(300, 2700, std))
I wanted to shade a specific area of the curve so using the Internet I created a function as follows:
funcShaded <- function(x) {
y = dnorm(x, mean = m, sd = std)
y[x < score] <- NA
return(y)
}
And then added a layer to my curve with
p + stat_function(fun = funcShaded, geom="area", fill="#84CA72", alpha=.2)
This works to create the graph I desire. However, I have 2 questions about this. First, when I break the code down
data.frame(x = c(300, 2700))
creates a two item dataframe as you would expect so how is this capable to being used to create x-axis values and, further, to be passed to the function to be used appropriately (read. as if it were a list of values)?
Second, I now want to re-use this function later to fill in other area under the curve based on a different score (e.g. score2 = 1630
) and was thinking I could just add another variable to funcShaded
to pass score (i.e. funcShaded <- function(x, score))
and then call my stat_function
as follows: p + stat_function(fun = funcShaded(x, score2), ...)
but:
x
variable is never explicitly "created" with this code because it doesn't show up in my Environment and when I try this code I get Error: object 'x' not found
So I guess I'm just curious as to how 'x' is working in this situation and if I should be creating it differently given what I want to do.
In order to create a normal curve, we create a ggplot base layer that has an x-axis range from -4 to 4 (or whatever range you want!), and assign the x-value aesthetic to this range ( aes(x = x) ). We then add the stat_function option and add dnorm to the function argument to make it a normal curve.
The easiest-to-find method for shading under a normal density is to use the polygon() command. That link is to the first hit on Google for “Shading Under a Normal Curve in R.” It works (like a charm), but it is not the most intuitive way to let users produce plots of normal densities.
The function passed to stat_function
must be uncalled (unless it returns another function; an adverb like purrr::partial
or the like is another approach here), because stat_function
needs to pass it a vector of x
values.
You've already done with dnorm
what you need to do with funcShaded
: pass additional fixed parameters through args
:
library(ggplot2)
score = 1800
m = 1500
std = 300
funcShaded <- function(x, lower_bound) {
y = dnorm(x, mean = m, sd = std)
y[x < lower_bound] <- NA
return(y)
}
ggplot(data.frame(x = c(300, 2700)), aes(x = x)) +
stat_function(fun = dnorm, args = list(mean = m, sd = std)) +
stat_function(fun = funcShaded, args = list(lower_bound = score),
geom = "area", fill = "#84CA72", alpha = .2) +
scale_x_continuous(name = "Score", breaks = seq(300, 2700, std))
Alternately, without writing your own function, you can do the same thing with stat_function
's xlim
parameter:
ggplot(data.frame(x = c(300, 2700)), aes(x = x)) +
stat_function(fun = dnorm, args = list(mean = m, sd = std)) +
stat_function(fun = dnorm, args = list(mean = m, sd = std), xlim = c(score, 2700),
geom = "area", fill = "#84CA72", alpha = .2) +
scale_x_continuous(name = "Score", breaks = seq(300, 2700, std))
As for how stat_function
uses the values passed into its x
aesthetic, it uses them as limits between which to interpolate a grid of values, the number of which set by its n
parameter, which defaults to 101. It's decidedly a different usage than most stats
, but it's a very useful function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With