I have a data file consisting of 57 variables. I want to transform about 12 of them into z-scores due to their uneven level of measurement. I looked up internet resources and help files. One internet resource adviced that I need the package Rbasic (does not exist). I used scale() which only seemed to center the variables. I tried V5-mean/st.dev. which got me very strange scores. Can somebody please give me practical adivce?
scale()
is the correct choice here:
> x <- 1:10
> scale(x)
[,1]
[1,] -1.4863011
[2,] -1.1560120
[3,] -0.8257228
[4,] -0.4954337
[5,] -0.1651446
[6,] 0.1651446
[7,] 0.4954337
[8,] 0.8257228
[9,] 1.1560120
[10,] 1.4863011
attr(,"scaled:center")
[1] 5.5
attr(,"scaled:scale")
[1] 3.02765
> (x - mean(x)) / sd(x)
[1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446
[6] 0.1651446 0.4954337 0.8257228 1.1560120 1.4863011
> mean(x)
[1] 5.5
> sd(x)
[1] 3.02765
Notice how the attributes in the object returned from scale()
are the mean and SD of the input data.
Now you don't provide real code to show how you computed "V5-mean/st.dev" but if you did it exactly like that the operator precedence might have caught you out. This for example doesn't return the correct z-scores:
> x - mean(x) / sd(x)
[1] -0.8165902 0.1834098 1.1834098 2.1834098 3.1834098
[6] 4.1834098 5.1834098 6.1834098 7.1834098 8.1834098
mu <- mean(myRow)
sigma <- sqrt ( var(myRow) )
myRow <- (myRow - mu )/ sqrt(sigma)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With