Consider the following toy data and computations:
library(dplyr)
df <- tibble(x = 1)
stats::sd(df$x)
dplyr::summarise(df, sd_x = sd(x))
The first calculation results in NA
whereas the second, when the calculation is included in the dplyr function summarise
produces NaN
. I would expect both calculations to generate the same result and I wonder why they differ?
In R, NaN stands for Not a Number. Typically NaN values occur when you attempt to perform some calculation that results in an invalid result. Note that NaN values are different from NA values, which simply represent missing values.
The NaN values are referred to as the Not A Number in R. It is also called undefined or unrepresentable but it belongs to numeric data type for the values that are not numeric, especially in case of floating-point arithmetic. To remove rows from data frame in R that contains NaN, we can use the function na. omit.
You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.
is. nan() Function in R Language is used to check if the vector contains any NaN(Not a Number) value as element. It returns a boolean value for all the elements of the vector.
It is calling a different function. I'm not clear what the function is, but it is not the stats
one.
dplyr::summarise(df, sd_x = stats::sd(x))
# A tibble: 1 x 1
sd_x
<dbl>
1 NA
debugonce(sd) # debug to see when sd is called
Not called here:
dplyr::summarise(df, sd_x = sd(x))
# A tibble: 1 x 1
sd_x
<dbl>
1 NaN
But called here:
dplyr::summarise(df, sd_x = stats::sd(x))
debugging in: stats::sd(1)
debug: sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
na.rm = na.rm))
...
Update
It appears that the sd
within summarise
gets calculated outside of R, hinted at in this header file: https://github.com/tidyverse/dplyr/blob/master/inst/include/dplyr/Result/Sd.h
A number of functions seem to be redefined by dplyr. Given that var
gives the same result in both cases, I think the sd behaviour is a bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With