Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What type is a sklearn model?

I'm writing some code which evaluates different sklearn models against some data. I am using type hints, both for my own education and to help other people who will eventually have to read my code.

My question is how do I specify the type of a sklearn predictor (such as LinearRegression())?

For example:

def model_tester(model : Predictor,
                 parameter: int
                 ) -> np.ndarray:
     """An example function with type hints."""

     # do stuff to model 

     return values

I see the typing library can make new types or I can use TypeVar to do:

Predictor = TypeVar('Predictor') 

but I wouldn't want to use this if there was already a conventional type for an sklearn model.

Checking the type of LinearRegression() yields:

 sklearn.linear_model.base.LinearRegression

and this is clearly of use, but only if I am interested in the LinearRegression model.

like image 966
FChm Avatar asked Feb 25 '19 14:02

FChm


People also ask

What type is sklearn model?

base. BaseEstimator . If you want to be more specific, maybe use sklearn. base.

What is sklearn model in Python?

Scikit-learn is a free machine learning library for Python. It features various algorithms like support vector machine, random forests, and k-neighbours, and it also supports Python numerical and scientific libraries like NumPy and SciPy .

What are modules in sklearn?

The sklearn. covariance module includes methods and algorithms to robustly estimate the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.

Is sklearn a framework?

scikit-learn is a high level framework designed for supervised and unsupervised machine learning algorithms. Being one of the components of the Python scientific ecosystem, it's built on top of NumPy and SciPy libraries, each responsible for lower-level data science tasks.


2 Answers

From Python 3.8 on (or earlier using typing-extensions), you can use typing.Protocol. Using protocols, you can use a concept called structural subtyping to define exactly the type's expected structure:

from typing import Protocol
# from typing_extensions import Protocol  # for Python <3.8

class ScikitModel(Protocol):
    def fit(self, X, y, sample_weight=None): ...
    def predict(self, X): ...
    def score(self, X, y, sample_weight=None): ...
    def set_params(self, **params): ...

which you can then use as a type hint:

def do_stuff(model: ScikitModel) -> Any:
    model.fit(train_data, train_labels)  # this type checks 
    score = model.score(test_data, test_labels)  # this type checks
    ...
like image 70
Peter Avatar answered Oct 01 '22 21:10

Peter


I think the most generic class that all models inherit from would be sklearn.base.BaseEstimator.

If you want to be more specific, maybe use sklearn.base.ClassifierMixin or sklearn.base.RegressorMixin.

So I would do:

from sklearn.base import RegressorMixin


def model_tester(model: RegressorMixin, parameter: int) -> np.ndarray:
     """An example function with type hints."""

     # do stuff to model 

     return values

I am no expert in type checking, so correct me if this is not right.

like image 40
FlorianGD Avatar answered Oct 01 '22 21:10

FlorianGD