I recently came across a requirement that I have a .fit()
trained scikit-learn
SVC
Classifier instance and need to .predict()
lots of instances.
Is there a way to parallelise only this .predict()
method by any scikit-learn
built-in tools?
from sklearn import svm
data_train = [[0,2,3],[1,2,3],[4,2,3]]
targets_train = [0,1,0]
clf = svm.SVC(kernel='rbf', degree=3, C=10, gamma=0.3, probability=True)
clf.fit(data_train, targets_train)
# this can be very large (~ a million records)
to_be_predicted = [[1,3,4]]
clf.predict(to_be_predicted)
If somebody does know a solution, I will be more than happy if you could share it.
Working example from above...
from joblib import Parallel, delayed
from sklearn import svm
data_train = [[0,2,3],[1,2,3],[4,2,3]]
targets_train = [0,1,0]
clf = svm.SVC(kernel='rbf', degree=3, C=10, gamma=0.3, probability=True)
clf.fit(data_train, targets_train)
to_be_predicted = np.array([[1,3,4], [1,3,4], [1,3,5]])
clf.predict(to_be_predicted)
n_cores = 3
parallel = Parallel(n_jobs=n_cores)
results = parallel(delayed(clf.predict)(to_be_predicted[i].reshape(-1,3))
for i in range(n_cores))
np.vstack(results).flatten()
array([1, 1, 0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With