Why does scale return NaN for zero variance columns?

Question

Consider the following matrix:

x <- matrix(c(1,1,1,3),2)
x
     [,1] [,2]
[1,]    1    1
[2,]    1    3

When calling scale with this, NaN values are returned for the first column, which has zero variance:

scale(x)
     [,1]       [,2]
[1,]  NaN -0.7071068
[2,]  NaN  0.7071068
attr(,"scaled:center")
[1] 1 2
attr(,"scaled:scale")
[1] 0.000000 1.414214

However, I would expect it to return 0. Is this a bug or am I misunderstanding what this is and should return?

The work around for what I want is:

y <- scale(x)
y[is.nan(y)] <- 0

But this involves the use of an extra variable, is there a more elegant solution?

Sven Hohenstein · Accepted Answer

You could use the following workaround:

apply(x, 2, function(y) (y - mean(y)) / sd(y) ^ as.logical(sd(y)))

     [,1]       [,2]
[1,]    0 -0.7071068
[2,]    0  0.7071068

Matthew Lundberg · Answer

Because scale divides by the variance, it must do this.

Continuous variables really aren't supposed have ties, much less zero variance, and it is not appropriate to scale a discrete or categorical variable.

Why does scale return NaN for zero variance columns?

Tags:

r

James

2 Answers

Sven Hohenstein

Matthew Lundberg

Recent Activity

Donate For Us

Why does scale return NaN for zero variance columns?

Tags:

r

James

2 Answers

Sven Hohenstein

Matthew Lundberg

Related questions

Recent Activity

Donate For Us