Given some classifier (SVC/Forest/NN/whatever) is it safe to call .predict
on the same instance concurrently from different threads?
From a distant point of view, my guess is they do not mutate any internal state. But I did not find anything in the docs about it.
Here is a minimal example showing what I mean:
#!/usr/bin/env python3
import threading
from sklearn import datasets
from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
X, y = datasets.load_iris(return_X_y=True)
# Some model. Might be any type, e.g.:
clf = svm.SVC()
clf = RandomForestClassifier(),
clf = MLPClassifier(solver='lbfgs')
clf.fit(X, y)
def use_model_for_predictions():
for _ in range(10000):
clf.predict(X[0:1])
# Is this safe?
thread_1 = threading.Thread(target=use_model_for_predictions)
thread_2 = threading.Thread(target=use_model_for_predictions)
thread_1.start()
thread_2.start()
Scikit-learn relies heavily on NumPy and SciPy, which internally call multi-threaded linear algebra routines implemented in libraries such as MKL, OpenBLAS or BLIS.
The Sklearn 'Predict' Method Predicts an OutputThat being the case, it provides a set of tools for doing things like training and evaluating machine learning models.
Yes, Keras is thread safe, if you pay a little attention to it. In fact, in reinforcement learning there is an algorithm called Asynchronous Advantage Actor Critics (A3C) where each agent relies on the same neural network to tell them what they should do in a given state. In other words, each thread calls model.
Scikit-learn is not intended to be used as a deep-learning framework and it does not provide any GPU support.
Before we start: This Python tutorial is a part of our series of Python Package tutorials. Scikit-Learn is one of the most useful Machine Learning (ML) libraries in Python. It includes many supervised and unsupervised algorithms that can be used to analyze datasets and make predictions about the data.
Updated Jan/2020: Updated for changes in scikit-learn v0.22 API. Photo by Cosimo, some rights reserved. This tutorial is divided into 3 parts; they are: 1. First Finalize Your Model Before you can make predictions, you must train a final model. You may have trained models using k-fold cross validation or train/test splits of your data.
We can predict the class for new data instances using our finalized classification model in scikit-learn using the predict() function. For example, we have one or more data instances in an array called Xnew. This can be passed to the predict() function on our model in order to predict the class values for each instance in the array.
Scikit-learn’s main purpose is to build models; so this library is not well suited for other activities like reading, manipulation of data or the summarisation of data. The benefits of Scikit-learn are: It is one of the most consistent interfaces available today to build ML models.
Check out this Q&A, the predict
and predict_proba
methods should be thread safe as they only call NumPy, they do not affect model itself in any case so answer to your question is yes.
You can find some info as well in replies here.
For example in naive bayes the code is following:
def predict(self, X):
"""
Perform classification on an array of test vectors X.
Parameters
----------
X : array-like of shape (n_samples, n_features)
Returns
-------
C : ndarray of shape (n_samples,)
Predicted target values for X
"""
check_is_fitted(self)
X = self._check_X(X)
jll = self._joint_log_likelihood(X)
return self.classes_[np.argmax(jll, axis=1)]
You can see that the first two lines are only checks for input. Abstract method _joint_log_likelihood
is the one that interests us, described as:
@abstractmethod
def _joint_log_likelihood(self, X):
"""Compute the unnormalized posterior log probability of X
I.e. ``log P(c) + log P(x|c)`` for all rows x of X, as an array-like of
shape (n_classes, n_samples).
Input is passed to _joint_log_likelihood as-is by predict,
predict_proba and predict_log_proba.
"""
And finally for example for multinominal NB the function looks like (source):
def _joint_log_likelihood(self, X):
"""
Compute the unnormalized posterior log probability of X, which is
the features' joint log probability (feature log probability times
the number of times that word appeared in that document) times the
class prior (since we're working in log space, it becomes an addition)
"""
joint_prob = X * self.feature_log_prob_.T + self.class_log_prior_
return joint_prob
You can see that there is nothing thread unsafe in predict
. Of course you can go through codes and check that for any of those classifiers :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With