Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the difference between transformer and estimator in sklearn?

Tags:

scikit-learn

I saw both transformer and estimator were mentioned in the sklearn documentation.

Is there any difference between these two words?

like image 934
Son Avatar asked Feb 27 '19 06:02

Son


People also ask

What is a transformer and an estimator?

Transformer : A Transformer is an algorithm which can transform one DataFrame into another DataFrame . E.g., an ML model is a Transformer which transforms DataFrame with features into a DataFrame with predictions. Estimator : An Estimator is an algorithm which can be fit on a DataFrame to produce a Transformer .

What is a transformer in sklearn?

Transformers are classes that enable data transformations while preprocessing the data for machine learning. Examples of transformers in Scikit-Learn are SimpleImputer, MinMaxScaler, OrdinalEncoder, PowerTransformer, to name a few.

What is estimators in sklearn?

Estimators objects Fitting data: the main API implemented by scikit-learn is that of the estimator . An estimator is any object that learns from data; it may be a classification, regression or clustering algorithm or a transformer that extracts/filters useful features from raw data.

What is the difference between a classifier and an estimator?

an estimator is a predictor found from regression algorithm. a classifier is a predictor found from a classification algorithm. a model can be both an estimator or a classifier.


1 Answers

The basic difference is that a:

  • Transformer transforms the input data (X) in some ways.
  • Estimator predicts a new value (or values) (y) by using the input data (X).

Both the Transformer and Estimator should have a fit() method which can be used to train them (they learn some characteristics of the data). The signature is:

fit(X, y)

fit() does not return any value, just stores the learnt data inside the object.

Here X represents the samples (feature vectors) and y is the target vector (which may have single or multiple values per corresponding sample in X). Note that y can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at StandardScaler for example. It needs the initial data X for finding the mean and std of the data (it learns the characteristics of X, y is not needed).

Each Transformer should have a transform(X, y) function which like fit() takes the input X and returns a new transformed version of X (which generally should have same number samples but may or may not have same features).

On the other hand, Estimator should have a predict(X) method which should output the predicted value of y from the given X.

There will be some classes in scikit-learn which implement both transform() and predict(), like KMeans, in that case carefully reading the documentation should solve your doubts.

like image 57
Vivek Kumar Avatar answered Sep 18 '22 07:09

Vivek Kumar