Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scaling data in R gives spurious Error "length of 'center' must equal the number of columns of 'x'"

Tags:

r

scale

I am trying to scale a data.frame in the range of 0 and 1 using the following code:

for(i in 1:nrow(data))
{
  x <- data[i, ]
  data[i, ] <- scale(x, min(x), max(x)-min(x))
}

Data:
 x1   x2  x3  x4  x5  x6  x7  x8  x9  x10  x11  x12  x13  x14  x15  x16  x17 
 15   6   6   0   9   3   1   4   5    1    1   13    0    0   20    5   28
 2  24  14   7   0  15   7   0  11   3    3    4   15    7    0   30    0  344
 3  10   5   2   0   6   2   0   5   0    0    2    7    1    0   11    0  399
 4   9   4   2   0   5   2   0   4   0    0    2    6    1    0   10    0   28
 5   6   2   1   0   3   1   0   2   0    0    1    3    1    0    6    0   82
 6   9   4   2   0   5   2   0   4   0    0    2    6    1    0   10    0   42

But I am getting the following error message:

Error in scale.default(x, min(x), max(x) - min(x)) (from #4) : 
  length of 'center' must equal the number of columns of 'x'
like image 212
Shahzad Avatar asked Mar 29 '13 06:03

Shahzad


People also ask

How do you scale and center in R?

Scale() is a built-in R function that centers and/or scales the columns of a numeric matrix by default. Only if the value provided is numeric, the scale() function subtracts the values of each column by the matching “center” value from the argument. center: When scaling, whether the mean should be subtracted.

What is centering and scaling data in R?

Centering data means that the average of a variable is subtracted from the data. Scaling data means that the standard deviation of a variable is divided out of the data. step_normalize estimates the variable standard deviations and means from the data used in the training argument of prep.

What does scaling do in R?

scale() function in R Language is a generic function which centers and scales the columns of a numeric matrix. The center parameter takes either numeric alike vector or logical value. If the numeric vector is provided, then each column of the matrix has the corresponding value from center subtracted from it.


2 Answers

Using this data , your example works for me:

data <- matrix(sample(1:1000,17*6), ncol=17,nrow=6)
for(i in 1:nrow(data)){
  x <- data[i, ]
  data[i, ] <- scale(x, min(x), max(x)-min(x))
}

Here another option using scale , without a loop. You need just to provide a scale and a center with same columns that your matrix.

maxs <- apply(data, 2, max)    
mins <- apply(data, 2, min)
scale(data, center = mins, scale = maxs - mins)

EDIT how to access the result.

The scale returns a matrix with 2 attributes. To get a data.frame, you need just to coerce the scale result to a data.frame.

dat.scale <- scale(data, center = mins, scale = maxs - mins)
dat.sacle <- as.data.frame(dat.scale)
like image 62
agstudy Avatar answered Oct 02 '22 14:10

agstudy


The center and scale arguments to scale have to have length equal to the number of columns in x. It looks like data is a data.frame, so your x has as many columns as your data.frame does and hence the conflict. You can get past this snag three ways:

  • drop the row into an atomic vector before passing to scale (which will treat it as a single column): scale(as.numeric(x), ...)
  • convert data into a matrix, which drops row extractions into atomic vectors automatically.
  • use @agstudy's apply suggestion, which would work whether it's a data.frame or a matrix and is arguably the "right" way to do this in R.
like image 23
Matthew Plourde Avatar answered Oct 02 '22 15:10

Matthew Plourde