I am somewhat confused as to why the sd
function in R returns an array for matrix input (I suppose to maintain backwards compatibility, it always will). This is very odd behaviour to me:
#3d input, same same
print(length(mean(array(rnorm(60),dim=c(3,4,5)))))
print(length(sd(array(rnorm(60),dim=c(3,4,5)))))
#1d input, same same
print(length(mean(array(rnorm(60),dim=c(60)))))
print(length(sd(array(rnorm(60),dim=c(60)))))
#2d input, different!
print(length(mean(array(rnorm(60),dim=c(12,5)))))
print(length(sd(array(rnorm(60),dim=c(12,5)))))
I get
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 5
That is sd
behaves differently from mean
when the input is a 2-d array (and apparently only in that case!) Consider then, this failed function to rescale each column of a k-dimensional array by the standard deviation:
re.scale <- function(x) {
#rescale by the standard deviation of each column
scales <- apply(x,2,sd)
ret.val <- sweep(x,2,scales,"/")
}
#this works just fine
x <- array(rnorm(60),dim=c(12,5))
y <- re.scale(x)
#this throws a warning
x <- array(rnorm(60),dim=c(3,4,5))
y <- re.scale(x)
Is there some other function to replace sd
without this weird behavior? How would one write re.scale
properly? Or a Z-score-by-column function?
It is behaving as document in sd's help page. At the very top it announces:
"If x is a matrix or a data frame, a vector of the standard deviation of the columns is returned."
Note it does not say that the arrays are included, so only arrays with two dimensions are included. If you want to stop this behavior, then just make a vector out of it with c():
sd( c(array(rnorm(60),dim=c(12,5))) )
# [1] 0.9505643
I see that you added a request for column z scores. Try this for matrices:
colMeans(x)/sd(x)
And this for arrays (although the definition of a "column" may need clarification:
apply(x, 2:3, mean)/apply(x, 2:3, sd) # will generalize to higher dimensions
> set.seed(1)
> sd(array(rnorm(60),dim=c(12,5)))
[1] 0.8107276 1.1234795 0.7925743 0.6186082 0.9464160
Description
This function computes the standard deviation of the values in x. If na.rm is TRUE then missing values are removed before computation proceeds. If x is a matrix or a data frame, a vector of the standard deviation of the columns is returned.
> set.seed(1)
> sd(array(rnorm(60),dim=c(12,5)))
[1] 0.8107276 1.1234795 0.7925743 0.6186082 0.9464160
WARNING:
sd(<matrix>) is deprecated.
Use apply(*, 2, sd) instead.
Details
Prior to R 2.14.0, sd(dfrm) worked directly for a data.frame dfrm. This is now deprecated and you are expected to use sapply(dfrm, sd).
> sd(array(rnorm(60),dim=c(12,5)))
[1] 0.8551688
>
(no WARNIG)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With