I'm a big beginner in R and am very confused as to how ggplot is using variable "x" when creating normal curves. My situation is this. I'm trying to plot normal curves given specific means and standard deviations and in the absence of data the most common way I've seen to do this is as follows: <pre class="prettyprint"><code>score = 1800 m = 1500 std = 300 ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + stat_function(fun = dnorm, args = list(mean = m, sd = std)) + scale_x_continuous(name = "Score", breaks = seq(300, 2700, std)) </code></pre> I wanted to shade a specific area of the curve so using the Internet I created a function as follows: <pre class="prettyprint"><code>funcShaded <- function(x) { y = dnorm(x, mean = m, sd = std) y[x < score] <- NA return(y) } </code></pre> And then added a layer to my curve with <code>p + stat_function(fun = funcShaded, geom="area", fill="#84CA72", alpha=.2)</code> This works to create the graph I desire. However, I have 2 questions about this. First, when I break the code down <pre class="prettyprint"><code>data.frame(x = c(300, 2700)) </code></pre> creates a two item dataframe as you would expect so how is this capable to being used to create x-axis values and, further, to be passed to the function to be used appropriately (read. as if it were a list of values)? Second, I now want to re-use this function later to fill in other area under the curve based on a different score (e.g. <code>score2 = 1630</code>) and was thinking I could just add another variable to <code>funcShaded</code> to pass score (i.e. <code>funcShaded <- function(x, score))</code> and then call my <code>stat_function</code> as follows: <code>p + stat_function(fun = funcShaded(x, score2), ...)</code> but: <ol> <li>I'm not sure this syntax will work </li> <li>It seems like the <code>x</code> variable is never explicitly "created" with this code because it doesn't show up in my Environment and when I try this code I get Error: object 'x' not found </li> </ol> So I guess I'm just curious as to how 'x' is working in this situation and if I should be creating it differently given what I want to do.

The function passed to <code>stat_function</code> must be uncalled (unless it returns another function; an adverb like <code>purrr::partial</code> or the like is another approach here), because <code>stat_function</code> needs to pass it a vector of <code>x</code> values. You've already done with <code>dnorm</code> what you need to do with <code>funcShaded</code>: pass additional fixed parameters through <code>args</code>: <pre class="prettyprint lang-r prettyprint-override"><code>library(ggplot2) score = 1800 m = 1500 std = 300 funcShaded <- function(x, lower_bound) { y = dnorm(x, mean = m, sd = std) y[x < lower_bound] <- NA return(y) } ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + stat_function(fun = dnorm, args = list(mean = m, sd = std)) + stat_function(fun = funcShaded, args = list(lower_bound = score), geom = "area", fill = "#84CA72", alpha = .2) + scale_x_continuous(name = "Score", breaks = seq(300, 2700, std)) </code></pre> <img src="https://i.imgur.com/XW5MPx2.png" alt=""> Alternately, without writing your own function, you can do the same thing with <code>stat_function</code>'s <code>xlim</code> parameter: <pre class="prettyprint lang-r prettyprint-override"><code>ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + stat_function(fun = dnorm, args = list(mean = m, sd = std)) + stat_function(fun = dnorm, args = list(mean = m, sd = std), xlim = c(score, 2700), geom = "area", fill = "#84CA72", alpha = .2) + scale_x_continuous(name = "Score", breaks = seq(300, 2700, std)) </code></pre> As for how <code>stat_function</code> uses the values passed into its <code>x</code> aesthetic, it uses them as limits between which to interpolate a grid of values, the number of which set by its <code>n</code> parameter, which defaults to 101. It's decidedly a different usage than most <code>stats</code>, but it's a very useful function.

Using stat_function to draw partially shaded normal curve in ggplot2

Q: How to draw normal distribution curve in r ggplot?

In order to create a normal curve, we create a ggplot base layer that has an x-axis range from -4 to 4 (or whatever range you want!), and assign the x-value aesthetic to this range ( aes(x = x) ). We then add the stat_function option and add dnorm to the function argument to make it a normal curve.

Q: How do you shade normal distribution in R?

The easiest-to-find method for shading under a normal density is to use the polygon() command. That link is to the first hit on Google for “Shading Under a Normal Curve in R.” It works (like a charm), but it is not the most intuitive way to let users produce plots of normal densities.

Tags:

function

dataframe

r

ggplot2

I'm a big beginner in R and am very confused as to how ggplot is using variable "x" when creating normal curves.

My situation is this. I'm trying to plot normal curves given specific means and standard deviations and in the absence of data the most common way I've seen to do this is as follows:

score = 1800
m = 1500
std = 300

ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + stat_function(fun = 
     dnorm, args = list(mean = m, sd = std)) + scale_x_continuous(name 
     = "Score", breaks = seq(300, 2700, std))

I wanted to shade a specific area of the curve so using the Internet I created a function as follows:

funcShaded <- function(x) {
    y = dnorm(x, mean = m, sd = std)
    y[x < score] <- NA
    return(y)
}

And then added a layer to my curve with p + stat_function(fun = funcShaded, geom="area", fill="#84CA72", alpha=.2)

This works to create the graph I desire. However, I have 2 questions about this. First, when I break the code down

data.frame(x = c(300, 2700))

creates a two item dataframe as you would expect so how is this capable to being used to create x-axis values and, further, to be passed to the function to be used appropriately (read. as if it were a list of values)?

Second, I now want to re-use this function later to fill in other area under the curve based on a different score (e.g. score2 = 1630) and was thinking I could just add another variable to funcShaded to pass score (i.e. funcShaded <- function(x, score)) and then call my stat_function as follows: p + stat_function(fun = funcShaded(x, score2), ...) but:

I'm not sure this syntax will work
It seems like the x variable is never explicitly "created" with this code because it doesn't show up in my Environment and when I try this code I get Error: object 'x' not found

So I guess I'm just curious as to how 'x' is working in this situation and if I should be creating it differently given what I want to do.

946

asked Feb 12 '18 18:02

chainhomelow

1 Answers

The function passed to stat_function must be uncalled (unless it returns another function; an adverb like purrr::partial or the like is another approach here), because stat_function needs to pass it a vector of x values.

You've already done with dnorm what you need to do with funcShaded: pass additional fixed parameters through args:

library(ggplot2)

score = 1800
m = 1500
std = 300

funcShaded <- function(x, lower_bound) {
    y = dnorm(x, mean = m, sd = std)
    y[x < lower_bound] <- NA
    return(y)
}

ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = std)) + 
    stat_function(fun = funcShaded, args = list(lower_bound = score), 
                  geom = "area", fill = "#84CA72", alpha = .2) +
    scale_x_continuous(name = "Score", breaks = seq(300, 2700, std))

Alternately, without writing your own function, you can do the same thing with stat_function's xlim parameter:

ggplot(data.frame(x = c(300, 2700)), aes(x = x)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = std)) + 
    stat_function(fun = dnorm, args = list(mean = m, sd = std), xlim = c(score, 2700),
                  geom = "area", fill = "#84CA72", alpha = .2) +
    scale_x_continuous(name = "Score", breaks = seq(300, 2700, std))

As for how stat_function uses the values passed into its x aesthetic, it uses them as limits between which to interpolate a grid of values, the number of which set by its n parameter, which defaults to 101. It's decidedly a different usage than most stats, but it's a very useful function.

189

answered Nov 14 '22 23:11

alistaire

Related questions
                            
                                Legend not shown in plotly stacked bar chart when only one trace in R
                            
                                Principal Components Analysis:Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
                            
                                How to label an individual state on the map while the others at sub-divisional level
                            
                                Convert raster into matrix with R
                            
                                cbind named vectors in R by name
                            
                                Disable hover information for a specific layer (geom) of plotly
                            
                                How to subset a Data frame column wise using column names? [duplicate]
                            
                                rstudioapi askForPassword without masking for username entry
                            
                                R: Deleting rows based on a value in a column from a large data set in R [duplicate]
                            
                                Using group_by with mutate_if by column name
                            
                                Split data frame by two factors
                            
                                get a line break / new line in excel file with r xlsx
                            
                                replace values throughout a tibble
                            
                                Subsetting geojson data with R
                            
                                Correlation Matrix - tidyr gather v. reshape2 melt
                            
                                dplyr number of rows across groups after filtering
                            
                                How to put plots without any space using plot_grid?
                            
                                convert all factor columns to character in a data.frame without affecting non-factor columns
                            
                                How to plot dataframe in R as a heatmap/grid?
                            
                                How can I add a message box in R Shiny?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With