The difference between fit() and the above mentioned two methods is very distinct. fit is present in all classes of sklearn and fits an object's internal variables according to the class, be it a training model class or a preprocessor one.
The difference between transform() and predict(), however, seems to be a little vague. One general rule I have seen being thrown around is that - predict() belongs to a supervised learning class and transform() belongs to an unsupervised learning class.
Nevertheless, I have found some exceptions to this general rule. K-Means (an unsupervised algorithm) and PLSRegression (a supervised one) are two classes that have both of these methods. I went through the K-Means documentation and I've understood what the two different methods return; but the implementations of these two interfaces don't seem VERY concretely defined in the documentation.
Like fit(), transform() can mean different things depending upon context. For a preprocessing class, fit() calculates the mean of the dataset, for example and transform applies it. For a training class, fit() acts differently and trains the model.
predict() is still very self explanatory and I have only included it in the question because the different usages of transform() seems only explainable with predict() as a point of comparison.
After having gone through a multitude of questions, parts of the documentation including the development manual, the glossary, and little parts of the Hands-on ML Book, this is the conclusion I have arrived at - It seems as if these interfaces are not meant to have a clear cut definition and are meant to be flexible with every class that is written.
Is this correct or am I missing something?
predict methods take the input features and return something that is qualitatively different, i.e. labels y (in supervised settings) or cluster memberships (in most clustering settings).
transform, on the other hand, as its name already implies, returns something that is qualitatively similar to its input but expressed in a different form, i.e. scaled features, PCA features etc. And as a rule, they are usually accompanied with a respective inverse_transform method to return the input to its original form (while arguably we cannot imagine an inverse_predict method, to go from labels or cluster memberships to features).
This difference is readily apparent in cases such as the ones you mention, where both predict and transform methods are available; in kmeans, for example, predict(X) will return cluster memberships, but transform(X) will return a transformation (i.e. different version) X_new of the input features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With