Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating z-scores

Tags:

r

I have a data file consisting of 57 variables. I want to transform about 12 of them into z-scores due to their uneven level of measurement. I looked up internet resources and help files. One internet resource adviced that I need the package Rbasic (does not exist). I used scale() which only seemed to center the variables. I tried V5-mean/st.dev. which got me very strange scores. Can somebody please give me practical adivce?

like image 515
Rose Blasche Avatar asked May 27 '11 05:05

Rose Blasche


2 Answers

scale() is the correct choice here:

> x <- 1:10
> scale(x)
            [,1]
 [1,] -1.4863011
 [2,] -1.1560120
 [3,] -0.8257228
 [4,] -0.4954337
 [5,] -0.1651446
 [6,]  0.1651446
 [7,]  0.4954337
 [8,]  0.8257228
 [9,]  1.1560120
[10,]  1.4863011
attr(,"scaled:center")
[1] 5.5
attr(,"scaled:scale")
[1] 3.02765
> (x - mean(x)) / sd(x)
 [1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446
 [6]  0.1651446  0.4954337  0.8257228  1.1560120  1.4863011
> mean(x)
[1] 5.5
> sd(x)
[1] 3.02765

Notice how the attributes in the object returned from scale() are the mean and SD of the input data.

Now you don't provide real code to show how you computed "V5-mean/st.dev" but if you did it exactly like that the operator precedence might have caught you out. This for example doesn't return the correct z-scores:

> x - mean(x) / sd(x)
 [1] -0.8165902  0.1834098  1.1834098  2.1834098  3.1834098
 [6]  4.1834098  5.1834098  6.1834098  7.1834098  8.1834098
like image 168
Gavin Simpson Avatar answered Oct 26 '22 23:10

Gavin Simpson


mu <- mean(myRow) 

sigma   <- sqrt ( var(myRow)  )

myRow <- (myRow - mu )/ sqrt(sigma)
like image 24
francesco Avatar answered Oct 27 '22 01:10

francesco