Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which SKLearn interface defines .fit, .predict etc

Examining the sklearn.base, more specifically BaseEstimator, and the different mixins, it is obvious that some of the mixins are dependent on the the ability to call .fit or .predict.

For example, if we'd look at the RegressorMixin we'd see it relies on the .predict method.

My question is why is there no implementation of an interface / abstract class that enforces the implementation of these methods?

I'd expect to have something like BaseRegressor that has .predict() as an abstract method and BaseClassifier to have .predict_proba() and .predict() - or something similar

like image 377
bluesummers Avatar asked Jun 18 '19 13:06

bluesummers


People also ask

What does .FIT do in Sklearn?

The fit() method takes the training data as arguments, which can be one array in the case of unsupervised learning, or two arrays in the case of supervised learning.

What are the attributes in scikit-learn?

Elsewhere features are known as attributes, predictors, regressors, or independent variables. Nearly all estimators in scikit-learn assume that features are numeric, finite and not missing, even when they have semantically distinct domains and distributions (categorical, ordinal, count-valued, real-valued, interval).

What are the classes in scikit-learn?

Base classes Base class for all estimators in scikit-learn. Mixin class for all bicluster estimators in scikit-learn. Mixin class for all classifiers in scikit-learn. Mixin class for all cluster estimators in scikit-learn.


2 Answers

The common idiom in python is 'duck typing' - if it behaves like a duck it's a duck, if it implements fit or any other relevant function it's a model for sklearn

there's also the concept of abstract base classes, but it's usage is less common

see more here: https://en.wikipedia.org/wiki/Duck_typing

like image 39
Ophir Yoktan Avatar answered Sep 24 '22 16:09

Ophir Yoktan


There are a few things which together make it probably more clear why things are done in a package like scikit-learn, the way they are:

  • duck typing vs inheritance: you can find very long arguments about which one is a better approach, and while they both have their advantages and disadvantages, at the end of the day, it comes down to what people in a community are used to. As somebody who does a lot of Python these days, I love duck typing, and I'm very comfortable with it. At the same time, 15 years ago, I loved abstract classes and OOP and what not, and I wouldn't understand why you would do things any other way. What I'm trying to say, is that people in Python like duck typing and that's partly why you see the pattern very often in some of its core packages.

  • duck typing, contrib packages and extentions: sometimes checking an input, we can either check its type, or duck type it for a certain functionality. If we check the type, that means any input to that method should actually inherit from those classes, whereas if you duck type them, they can simply be implementing those methods and they're fine. This is important because if a developer is writing an estimator outside scikit-learn, for instance, which they want to be compatible with certain parts of scikit-learn, they don't have to depend on scikit-learn as a dependency (because that's how they can then inherit a certain class from the package), and simply implement those methods. If developers have the constraints to keep their package and their dependencies lightweight, this becomes relevant (and we have seen these exact issues in scikit-learn).

  • Mixin classes: the idea behind the Mixin classes is not really that the child classes should inherit them and implement their methods; but it's more about adding a functionality to existing classes through them without having to copy/paste or reimplement any method. For instance, the TransformerMixin adds the fit_transform method to an object, assuming it already has fit and transform, without caring about weather the object is an estimator or a transformer. Again, you could argue that a certain design pattern from OOP may be better here, but that's a never ending argument, and this approach works, and the developers are comfortable with it.

like image 195
adrin Avatar answered Sep 24 '22 16:09

adrin