Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are predictions on scikit-learn models thread-safe?

Given some classifier (SVC/Forest/NN/whatever) is it safe to call .predict on the same instance concurrently from different threads?

From a distant point of view, my guess is they do not mutate any internal state. But I did not find anything in the docs about it.

Here is a minimal example showing what I mean:

#!/usr/bin/env python3
import threading

from sklearn import datasets
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

X, y = datasets.load_iris(return_X_y=True)

# Some model. Might be any type, e.g.:
clf = svm.SVC()
clf = RandomForestClassifier(),
clf = MLPClassifier(solver='lbfgs')

clf.fit(X, y)


def use_model_for_predictions():
    for _ in range(10000):
        clf.predict(X[0:1])


# Is this safe?
thread_1 = threading.Thread(target=use_model_for_predictions)
thread_2 = threading.Thread(target=use_model_for_predictions)
thread_1.start()
thread_2.start()
like image 709
Tobias Hermann Avatar asked Sep 29 '20 07:09

Tobias Hermann


People also ask

Is Scikit learn multithreaded?

Scikit-learn relies heavily on NumPy and SciPy, which internally call multi-threaded linear algebra routines implemented in libraries such as MKL, OpenBLAS or BLIS.

What is predict in Scikit learn?

The Sklearn 'Predict' Method Predicts an OutputThat being the case, it provides a set of tools for doing things like training and evaluating machine learning models.

Is keras model thread safe?

Yes, Keras is thread safe, if you pay a little attention to it. In fact, in reinforcement learning there is an algorithm called Asynchronous Advantage Actor Critics (A3C) where each agent relies on the same neural network to tell them what they should do in a given state. In other words, each thread calls model.

Does Sklearn have GPU support?

Scikit-learn is not intended to be used as a deep-learning framework and it does not provide any GPU support.

What is scikit-learn for machine learning?

Before we start: This Python tutorial is a part of our series of Python Package tutorials. Scikit-Learn is one of the most useful Machine Learning (ML) libraries in Python. It includes many supervised and unsupervised algorithms that can be used to analyze datasets and make predictions about the data.

How do I make predictions in scikit-learn?

Updated Jan/2020: Updated for changes in scikit-learn v0.22 API. Photo by Cosimo, some rights reserved. This tutorial is divided into 3 parts; they are: 1. First Finalize Your Model Before you can make predictions, you must train a final model. You may have trained models using k-fold cross validation or train/test splits of your data.

How to predict the class for new data instances in scikit-learn?

We can predict the class for new data instances using our finalized classification model in scikit-learn using the predict() function. For example, we have one or more data instances in an array called Xnew. This can be passed to the predict() function on our model in order to predict the class values for each instance in the array.

What is scikit-learn used for?

Scikit-learn’s main purpose is to build models; so this library is not well suited for other activities like reading, manipulation of data or the summarisation of data. The benefits of Scikit-learn are: It is one of the most consistent interfaces available today to build ML models.


1 Answers

Check out this Q&A, the predict and predict_proba methods should be thread safe as they only call NumPy, they do not affect model itself in any case so answer to your question is yes.

You can find some info as well in replies here.

For example in naive bayes the code is following:

def predict(self, X):
    """
    Perform classification on an array of test vectors X.
    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
    Returns
    -------
    C : ndarray of shape (n_samples,)
        Predicted target values for X
    """
    check_is_fitted(self)
    X = self._check_X(X)
    jll = self._joint_log_likelihood(X)
    return self.classes_[np.argmax(jll, axis=1)]

You can see that the first two lines are only checks for input. Abstract method _joint_log_likelihood is the one that interests us, described as:

@abstractmethod
def _joint_log_likelihood(self, X):
    """Compute the unnormalized posterior log probability of X
    I.e. ``log P(c) + log P(x|c)`` for all rows x of X, as an array-like of
    shape (n_classes, n_samples).
    Input is passed to _joint_log_likelihood as-is by predict,
    predict_proba and predict_log_proba.
    """

And finally for example for multinominal NB the function looks like (source):

def _joint_log_likelihood(self, X):
    """
    Compute the unnormalized posterior log probability of X, which is
    the features' joint log probability (feature log probability times
    the number of times that word appeared in that document) times the
    class prior (since we're working in log space, it becomes an addition)
    """
    joint_prob = X * self.feature_log_prob_.T + self.class_log_prior_
    return joint_prob

You can see that there is nothing thread unsafe in predict. Of course you can go through codes and check that for any of those classifiers :)

like image 113
Ruli Avatar answered Sep 21 '22 13:09

Ruli