Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn StandardScaler returns all zeros

I have a sklearn StandardScaler saved from a previous model and am trying to apply it to new data

scaler = myOldStandardScaler
print("ORIG:", X)
print("CLASS:", X.__class__)
X = scaler.fit_transform(X)
print("SCALED:", X)

I have three observations each with 2000 features. If I run each observation separately I get an output of all zeros.

ORIG: [[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[ 0.  0.  0. ...,  0.  0.  0.]]

But if I append all three observations into one array, I get the results I want

ORIG: [[  0.00000000e+00   8.69737728e-08   7.53361877e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  9.49627142e-04   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[-1.07174217  1.41421356  1.37153077 ...,  0.          0.          0.        ]
[ 1.33494964 -0.70710678 -0.98439142 ...,  0.          0.          0.        ]
[-0.26320747 -0.70710678 -0.38713935 ...,  0.          0.          0.        ]]

I've seen these two questions:

  • Sklearn's MinMaxScaler only returns zeros
  • Unexpected StandardScaler fit_transform output

neither of which have an accepted answer.

I've tried:

  • reshaping from (1,n) to (n,1) (this gives incorrect results)
  • converting the array to np.float32 and np.float64 (still all zero)
  • creating an array of an array (again, all zero)
  • creating a np.matrix (again, all zeros)

What am I missing? The input to fit_transform is getting the same type, just a different size.

How do I get StandardScaler to work with a single observation?

like image 721
Sal Avatar asked Oct 04 '17 01:10

Sal


2 Answers

When you're trying to apply fit_transform method of StandardScaler object to array of size (1, n) you obviously get all zeros, because for each number of array you subtract from it mean of this number, which equal to number and divide to std of this number. If you want to get correct scaling of your array, you should convert it to array with size (n, 1). You can do it this way:

import numpy as np

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.fit_transform(X[:, np.newaxis])

In this case you get Standard scaling for one object by its features, that's not you're looking for.
If you want to get scaling by one feature of 3 objects, you should pass to fit_transform method array of size (3, 1) with values of certain feature corresponding to each object.

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.fit_transform(X[:, np.newaxis]) # you should get
# array([[-1.07174217], [1.33494964], [-0.26320747]]) you're looking for

And if you want to work with already fitted StandardScaler object, you shouldn't use fit_transform method, beacuse it refit object with new data. StandardScaler has transform method, which work with single observation:

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.transform(X.reshape(1, -1))
like image 156
Eduard Ilyasov Avatar answered Oct 17 '22 03:10

Eduard Ilyasov


I had the same problem. Another (simpler) solution to the problem of array with size (1, n) is to transpose the matrix and it will be size (n, 1).

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.transform(X.T)
like image 25
DRFeinberg Avatar answered Oct 17 '22 04:10

DRFeinberg