Consider the following matrix:
x <- matrix(c(1,1,1,3),2)
x
[,1] [,2]
[1,] 1 1
[2,] 1 3
When calling scale
with this, NaN
values are returned for the first column, which has zero variance:
scale(x)
[,1] [,2]
[1,] NaN -0.7071068
[2,] NaN 0.7071068
attr(,"scaled:center")
[1] 1 2
attr(,"scaled:scale")
[1] 0.000000 1.414214
However, I would expect it to return 0
. Is this a bug or am I misunderstanding what this is and should return?
The work around for what I want is:
y <- scale(x)
y[is.nan(y)] <- 0
But this involves the use of an extra variable, is there a more elegant solution?
You could use the following workaround:
apply(x, 2, function(y) (y - mean(y)) / sd(y) ^ as.logical(sd(y)))
[,1] [,2]
[1,] 0 -0.7071068
[2,] 0 0.7071068
Because scale
divides by the variance, it must do this.
Continuous variables really aren't supposed have ties, much less zero variance, and it is not appropriate to scale a discrete or categorical variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With