Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit-learn: preprocessing.scale() vs preprocessing.StandardScalar()

Tags:

I understand that scaling means centering the mean(mean=0) and making unit variance(variance=1).

But, What is the difference between preprocessing.scale(x)and preprocessing.StandardScalar() in scikit-learn?

like image 782
learncode Avatar asked Sep 16 '17 19:09

learncode


People also ask

What is the preprocessing data in scikit-learn?

6.3. Preprocessing data — scikit-learn 1.0 documentation 6.3. Preprocessing data ¶ The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.

Should I use standardscaler or scikit-learn?

Use StandardScaler if you want each feature to have zero-mean, unit standard-deviation. If you want more normally distributed data, and are okay with transforming your data. Check out scikit-learn’s QuantileTransformer (output_distribution='normal').

What is standardization in scikit-learn?

Standardization, or mean removal and variance scaling ¶ Standardization of datasets is a common requirement for many machine learning estimators implemented in scikit-learn; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.

What is the sklearn preprocessing package?

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set. If some outliers are present in the set, robust scalers or ...


1 Answers

Those are doing exactly the same, but:

  • preprocessing.scale(x) is just a function, which transforms some data
  • preprocessing.StandardScaler() is a class supporting the Transformer API

I would always use the latter, even if i would not need inverse_transform and co. supported by StandardScaler().

Excerpt from the docs:

The function scale provides a quick and easy way to perform this operation on a single array-like dataset

The preprocessing module further provides a utility class StandardScaler that implements the Transformer API to compute the mean and standard deviation on a training set so as to be able to later reapply the same transformation on the testing set. This class is hence suitable for use in the early steps of a sklearn.pipeline.Pipeline

like image 110
sascha Avatar answered Sep 18 '22 17:09

sascha