I understand that scaling means centering the mean(mean=0) and making unit variance(variance=1).
But, What is the difference between preprocessing.scale(x)and preprocessing.StandardScalar() in scikit-learn?
6.3. Preprocessing data — scikit-learn 1.0 documentation 6.3. Preprocessing data ¶ The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.
Use StandardScaler if you want each feature to have zero-mean, unit standard-deviation. If you want more normally distributed data, and are okay with transforming your data. Check out scikit-learn’s QuantileTransformer (output_distribution='normal').
Standardization, or mean removal and variance scaling ¶ Standardization of datasets is a common requirement for many machine learning estimators implemented in scikit-learn; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.
The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set. If some outliers are present in the set, robust scalers or ...
Those are doing exactly the same, but:
preprocessing.scale(x) is just a function, which transforms some datapreprocessing.StandardScaler() is a class supporting the Transformer API
I would always use the latter, even if i would not need inverse_transform and co. supported by StandardScaler().
Excerpt from the docs:
The function scale provides a quick and easy way to perform this operation on a single array-like dataset
The preprocessing module further provides a utility class StandardScaler that implements the Transformer API to compute the mean and standard deviation on a training set so as to be able to later reapply the same transformation on the testing set. This class is hence suitable for use in the early steps of a sklearn.pipeline.Pipeline
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With