Examining the <code>sklearn.base</code>, more specifically <code>BaseEstimator</code>, and the different mixins, it is obvious that some of the mixins are dependent on the the ability to call <code>.fit</code> or <code>.predict</code>. For example, if we'd look at the <code>RegressorMixin</code> we'd see it relies on the <code>.predict</code> method. My question is why is there no implementation of an interface / abstract class that enforces the implementation of these methods? I'd expect to have something like <code>BaseRegressor</code> that has <code>.predict()</code> as an abstract method and <code>BaseClassifier</code> to have <code>.predict_proba()</code> and <code>.predict()</code> - or something similar

The common idiom in python is 'duck typing' - if it behaves like a duck it's a duck, if it implements <code>fit</code> or any other relevant function it's a model for sklearn there's also the concept of abstract base classes, but it's usage is less common see more here: https://en.wikipedia.org/wiki/Duck_typing

There are a few things which together make it probably more clear why things are done in a package like <code>scikit-learn</code>, the way they are: <ul> <li>duck typing vs inheritance: you can find very long arguments about which one is a better approach, and while they both have their advantages and disadvantages, at the end of the day, it comes down to what people in a community are used to. As somebody who does a lot of Python these days, I love duck typing, and I'm very comfortable with it. At the same time, 15 years ago, I loved abstract classes and OOP and what not, and I wouldn't understand why you would do things any other way. What I'm trying to say, is that people in Python like duck typing and that's partly why you see the pattern very often in some of its core packages.</li> <li>duck typing, contrib packages and extentions: sometimes checking an input, we can either check its type, or duck type it for a certain functionality. If we check the type, that means any input to that method should actually inherit from those classes, whereas if you duck type them, they can simply be implementing those methods and they're fine. This is important because if a developer is writing an estimator outside <code>scikit-learn</code>, for instance, which they want to be compatible with certain parts of <code>scikit-learn</code>, they don't have to depend on <code>scikit-learn</code> as a dependency (because that's how they can then inherit a certain class from the package), and simply implement those methods. If developers have the constraints to keep their package and their dependencies lightweight, this becomes relevant (and we have seen these exact issues in <code>scikit-learn</code>).</li> <li><code>Mixin</code> classes: the idea behind the <code>Mixin</code> classes is not really that the child classes should inherit them and implement their methods; but it's more about adding a functionality to existing classes through them without having to copy/paste or reimplement any method. For instance, the <code>TransformerMixin</code> adds the <code>fit_transform</code> method to an object, assuming it already has <code>fit</code> and <code>transform</code>, without caring about weather the object is an estimator or a transformer. Again, you could argue that a certain design pattern from OOP may be better here, but that's a never ending argument, and this approach works, and the developers are comfortable with it.</li> </ul>

Which SKLearn interface defines .fit, .predict etc

2 Answers

The common idiom in python is 'duck typing' - if it behaves like a duck it's a duck, if it implements fit or any other relevant function it's a model for sklearn

there's also the concept of abstract base classes, but it's usage is less common

see more here: https://en.wikipedia.org/wiki/Duck_typing

answered Sep 24 '22 16:09

Ophir Yoktan

There are a few things which together make it probably more clear why things are done in a package like scikit-learn, the way they are:

duck typing vs inheritance: you can find very long arguments about which one is a better approach, and while they both have their advantages and disadvantages, at the end of the day, it comes down to what people in a community are used to. As somebody who does a lot of Python these days, I love duck typing, and I'm very comfortable with it. At the same time, 15 years ago, I loved abstract classes and OOP and what not, and I wouldn't understand why you would do things any other way. What I'm trying to say, is that people in Python like duck typing and that's partly why you see the pattern very often in some of its core packages.
duck typing, contrib packages and extentions: sometimes checking an input, we can either check its type, or duck type it for a certain functionality. If we check the type, that means any input to that method should actually inherit from those classes, whereas if you duck type them, they can simply be implementing those methods and they're fine. This is important because if a developer is writing an estimator outside scikit-learn, for instance, which they want to be compatible with certain parts of scikit-learn, they don't have to depend on scikit-learn as a dependency (because that's how they can then inherit a certain class from the package), and simply implement those methods. If developers have the constraints to keep their package and their dependencies lightweight, this becomes relevant (and we have seen these exact issues in scikit-learn).
Mixin classes: the idea behind the Mixin classes is not really that the child classes should inherit them and implement their methods; but it's more about adding a functionality to existing classes through them without having to copy/paste or reimplement any method. For instance, the TransformerMixin adds the fit_transform method to an object, assuming it already has fit and transform, without caring about weather the object is an estimator or a transformer. Again, you could argue that a certain design pattern from OOP may be better here, but that's a never ending argument, and this approach works, and the developers are comfortable with it.

195

answered Sep 24 '22 16:09

adrin

Related questions
                            
                                pandas.DataFrame.shift() fill_value not working
                            
                                Which installer to use for Miniconda with Python 3.6?
                            
                                How to free gpu memory by deleting tensors?
                            
                                Is there any quadratic programming function that can have both lower and upper bounds - Python
                            
                                How to run R script in python using rpy2
                            
                                How to use autocompleteselect widget in a modelform
                            
                                How to make the X-axis time dynamically refresh by using pyqtgraph TimeAxisItem
                            
                                Setting a random seed on TF 2.0
                            
                                Does Python's asyncio lock.acquire maintain order?
                            
                                Why not use mean squared error for classification problems?
                            
                                Make a dataframe of all unique words with their count and
                            
                                How to get all words with specific length that doesn't contain number?
                            
                                Flask-SQLAlchemy: SQLALCHEMY_ENGINE_OPTIONS not set up correctly
                            
                                Connect to Power BI XMLA endpoint with Python
                            
                                Why is asyncio queue await get() blocking?
                            
                                Append not working with DataFrames in for loop
                            
                                Handle Exception When Running Python Script From Another Python Script
                            
                                Delete variable from RAM
                            
                                Adding sublists elements based on indexing by condition in python
                            
                                Obtain features inside image and remove boundary

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which SKLearn interface defines .fit, .predict etc

Tags:

python

scikit-learn

bluesummers

People also ask

2 Answers

Ophir Yoktan

adrin

Recent Activity

Donate For Us