Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the center and scale after using the scale function in R

Tags:

r

scale

stat

It seems a silly question, but I have searched on line, but still did not find any sufficient reply.

My question is: suppose we have a matrix M, then we use the scale() function, how can we extract the center and scale of each column by writing a line of code (I know we can see the centers and scales..), but my matrix has lots of columns, it is cumbersome to do it manually.

Any ideas? Many thanks!

like image 461
陈见聪 Avatar asked Jul 08 '18 05:07

陈见聪


People also ask

How do I center and scale data in R?

Perhaps the most simple, quick and direct way to mean-center your data is by using the function scale() . By default, this function will standardize the data (mean zero, unit variance). To indicate that we just want to subtract the mean, we need to turn off the argument scale = FALSE .

What does scale () function do in R?

scale() function in R Language is a generic function which centers and scales the columns of a numeric matrix. The center parameter takes either numeric alike vector or logical value. If the numeric vector is provided, then each column of the matrix has the corresponding value from center subtracted from it.

How does R calculate scale?

The scale() function with default settings will calculate the mean and standard deviation of the entire vector, then “scale” each element by those values by subtracting the mean and dividing by the sd. If you use the scale(x, scale=FALSE), it will only subtract the mean but not divide by the std deviation.

Why do we scale in R?

Normalize Data with Standard Scaling in R. In Standard scaling, also known as Standardization of values, we scale the data values such that the overall statistical summary of every variable has a mean value of zero and an unit variance value.


1 Answers

you are looking for the attributes function:

 set.seed(1)
 mat = matrix(rnorm(1000),,10) # Suppose you have 10 columns
 s = scale(mat) # scale your data
 attributes(s)#This gives you the means and the standard deviations:
$`dim`
[1] 100  10

$`scaled:center`
 [1]  0.1088873669 -0.0378080766  0.0296735350  0.0516018586 -0.0391342406 -0.0445193567 -0.1995797418
 [8]  0.0002549694  0.0100772648  0.0040650015

$`scaled:scale`
 [1] 0.8981994 0.9578791 1.0342655 0.9916751 1.1696122 0.9661804 1.0808358 1.0973012 1.0883612 1.0548091

These values can also be obtained as:

 colMeans(mat)
 [1]  0.1088873669 -0.0378080766  0.0296735350  0.0516018586 -0.0391342406 -0.0445193567 -0.1995797418
 [8]  0.0002549694  0.0100772648  0.0040650015
 sqrt(diag(var(mat)))
 [1] 0.8981994 0.9578791 1.0342655 0.9916751 1.1696122 0.9661804 1.0808358 1.0973012 1.0883612 1.0548091

you get a list that you can subset the way you want:

or you can do

attr(s,"scaled:center")
 [1]  0.1088873669 -0.0378080766  0.0296735350  0.0516018586 -0.0391342406 -0.0445193567 -0.1995797418
 [8]  0.0002549694  0.0100772648  0.0040650015

attr(s,"scaled:scale")
 [1] 0.8981994 0.9578791 1.0342655 0.9916751 1.1696122 0.9661804 1.0808358 1.0973012 1.0883612 1.0548091
like image 127
KU99 Avatar answered Oct 17 '22 23:10

KU99