I saw both transformer and estimator were mentioned in the sklearn documentation.
Is there any difference between these two words?
Transformer : A Transformer is an algorithm which can transform one DataFrame into another DataFrame . E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions. Estimator : An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer .
Transformers are classes that enable data transformations while preprocessing the data for machine learning. Examples of transformers in Scikit-Learn are SimpleImputer, MinMaxScaler, OrdinalEncoder, PowerTransformer, to name a few.
Estimators objects Fitting data: the main API implemented by scikit-learn is that of the estimator . An estimator is any object that learns from data; it may be a classification, regression or clustering algorithm or a transformer that extracts/filters useful features from raw data.
an estimator is a predictor found from regression algorithm. a classifier is a predictor found from a classification algorithm. a model can be both an estimator or a classifier.
The basic difference is that a:
Transformer
transforms the input data (X
) in some ways. Estimator
predicts a new value (or values) (y
) by using the input data (X
). Both the Transformer
and Estimator
should have a fit()
method which can be used to train them (they learn some characteristics of the data). The signature is:
fit(X, y)
fit()
does not return any value, just stores the learnt data inside the object.
Here X
represents the samples (feature vectors) and y
is the target vector (which may have single or multiple values per corresponding sample in X
). Note that y
can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at StandardScaler
for example. It needs the initial data X
for finding the mean and std of the data (it learns the characteristics of X
, y
is not needed).
Each Transformer
should have a transform(X, y)
function which like fit()
takes the input X
and returns a new transformed version of X
(which generally should have same number samples but may or may not have same features).
On the other hand, Estimator
should have a predict(X)
method which should output the predicted value of y
from the given X
.
There will be some classes in scikit-learn which implement both transform()
and predict()
, like KMeans
, in that case carefully reading the documentation should solve your doubts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With