I saw both transformer and estimator were mentioned in the sklearn documentation. Is there any difference between these two words?

The basic difference is that a: <ul> <li> <code>Transformer</code> transforms the input data (<code>X</code>) in some ways. </li> <li> <code>Estimator</code> predicts a new value (or values) (<code>y</code>) by using the input data (<code>X</code>). </li> </ul> Both the <code>Transformer</code> and <code>Estimator</code> should have a <code>fit()</code> method which can be used to train them (they learn some characteristics of the data). The signature is: <pre class="prettyprint"><code>fit(X, y) </code></pre> <code>fit()</code> does not return any value, just stores the learnt data inside the object. Here <code>X</code> represents the samples (feature vectors) and <code>y</code> is the target vector (which may have single or multiple values per corresponding sample in <code>X</code>). Note that <code>y</code> can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at <code>StandardScaler</code> for example. It needs the initial data <code>X</code> for finding the mean and std of the data (it learns the characteristics of <code>X</code>, <code>y</code> is not needed). Each <code>Transformer</code> should have a <code>transform(X, y)</code> function which like <code>fit()</code> takes the input <code>X</code> and returns a new transformed version of <code>X</code> (which generally should have same number samples but may or may not have same features). On the other hand, <code>Estimator</code> should have a <code>predict(X)</code> method which should output the predicted value of <code>y</code> from the given <code>X</code>. There will be some classes in scikit-learn which implement both <code>transform()</code> and <code>predict()</code>, like <code>KMeans</code>, in that case carefully reading the documentation should solve your doubts.

what is the difference between transformer and estimator in sklearn?

1 Answers

The basic difference is that a:

Transformer transforms the input data (X) in some ways.
Estimator predicts a new value (or values) (y) by using the input data (X).

Both the Transformer and Estimator should have a fit() method which can be used to train them (they learn some characteristics of the data). The signature is:

fit(X, y)

fit() does not return any value, just stores the learnt data inside the object.

Here X represents the samples (feature vectors) and y is the target vector (which may have single or multiple values per corresponding sample in X). Note that y can be optional in some transformers where its not needed, but its mandatory for most estimators (supervised estimators). Look at StandardScaler for example. It needs the initial data X for finding the mean and std of the data (it learns the characteristics of X, y is not needed).

Each Transformer should have a transform(X, y) function which like fit() takes the input X and returns a new transformed version of X (which generally should have same number samples but may or may not have same features).

On the other hand, Estimator should have a predict(X) method which should output the predicted value of y from the given X.

There will be some classes in scikit-learn which implement both transform() and predict(), like KMeans, in that case carefully reading the documentation should solve your doubts.

answered Sep 18 '22 07:09

Vivek Kumar

Related questions
                            
                                How to estimate the progress of a GridSearchCV from verbose output in Scikit-Learn?
                            
                                Using Pandas 'categorical' dtype with sklearn
                            
                                How to get comparable and reproducible results from LogisticRegressionCV and GridSearchCV
                            
                                Complex dataset split - StratifiedGroupShuffleSplit
                            
                                unable to use FeatureUnion in scikit-learn due to different dimensions
                            
                                Can you fix the false negative rate in a classifier in scikit learn
                            
                                Scikit and Pandas: Fitting Large Data
                            
                                How to identify Cluster labels in kmeans scikit learn
                            
                                Graphviz.Source not rendering in Jupyter Notebook
                            
                                sklearn import error - ImportError: cannot import name 'comb'
                            
                                Sklearn: adding lemmatizer to CountVectorizer
                            
                                Scikit learn - fit_transform on the test set
                            
                                How to Find Documents That are in the same Cluster with KMeans
                            
                                Scikit-Learn PCA
                            
                                DBSCAN with custom metric
                            
                                Scikit learn ngram_range purpose in vectorizers
                            
                                CountVectorizer: Vocabulary wasn't fitted
                            
                                SKLearn warning "valid feature names" in version 1.0
                            
                                Deprecation warning in scikit-learn svmlight format loader
                            
                                UserWarning: Label not :NUMBER: is present in all training examples

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what is the difference between transformer and estimator in sklearn?

Tags:

scikit-learn

Son

People also ask

1 Answers

Vivek Kumar

Recent Activity

Donate For Us