I'm writing some code which evaluates different sklearn models against some data. I am using type hints, both for my own education and to help other people who will eventually have to read my code.
My question is how do I specify the type of a sklearn predictor (such as LinearRegression()
)?
For example:
def model_tester(model : Predictor,
parameter: int
) -> np.ndarray:
"""An example function with type hints."""
# do stuff to model
return values
I see the typing library can make new types or I can use TypeVar
to do:
Predictor = TypeVar('Predictor')
but I wouldn't want to use this if there was already a conventional type for an sklearn model.
Checking the type of LinearRegression() yields:
sklearn.linear_model.base.LinearRegression
and this is clearly of use, but only if I am interested in the LinearRegression model.
base. BaseEstimator . If you want to be more specific, maybe use sklearn. base.
Scikit-learn is a free machine learning library for Python. It features various algorithms like support vector machine, random forests, and k-neighbours, and it also supports Python numerical and scientific libraries like NumPy and SciPy .
The sklearn. covariance module includes methods and algorithms to robustly estimate the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.
scikit-learn is a high level framework designed for supervised and unsupervised machine learning algorithms. Being one of the components of the Python scientific ecosystem, it's built on top of NumPy and SciPy libraries, each responsible for lower-level data science tasks.
From Python 3.8 on (or earlier using typing-extensions), you can use typing.Protocol
. Using protocols, you can use a concept called structural subtyping to define exactly the type's expected structure:
from typing import Protocol
# from typing_extensions import Protocol # for Python <3.8
class ScikitModel(Protocol):
def fit(self, X, y, sample_weight=None): ...
def predict(self, X): ...
def score(self, X, y, sample_weight=None): ...
def set_params(self, **params): ...
which you can then use as a type hint:
def do_stuff(model: ScikitModel) -> Any:
model.fit(train_data, train_labels) # this type checks
score = model.score(test_data, test_labels) # this type checks
...
I think the most generic class that all models inherit from would be sklearn.base.BaseEstimator
.
If you want to be more specific, maybe use sklearn.base.ClassifierMixin
or sklearn.base.RegressorMixin
.
So I would do:
from sklearn.base import RegressorMixin
def model_tester(model: RegressorMixin, parameter: int) -> np.ndarray:
"""An example function with type hints."""
# do stuff to model
return values
I am no expert in type checking, so correct me if this is not right.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With