Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit learn SVC predict probability doesn't work as expected

Tags:

I built sentiment analyzer using SVM classifier. I trained model with probability=True and it can give me probability. But when I pickled my model and load it again later, the probability doesn't work anymore.

The model:

from sklearn.svm import SVC, LinearSVC
pipeline_svm = Pipeline([
    ('bow', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('classifier', SVC(probability=True)),])

# pipeline parameters to automatically explore and tune
param_svm = [
  {'classifier__C': [1, 10, 100, 1000], 'classifier__kernel': ['linear']},
  {'classifier__C': [1, 10, 100, 1000], 'classifier__gamma': [0.001, 0.0001], 'classifier__kernel': ['rbf']},
]

grid_svm = GridSearchCV(
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)

svm_detector_reloaded = cPickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict([""""Today is awesome day"""])[0])

Gives me:

AttributeError: predict_proba is not available when probability=False

like image 927
KevinOelen Avatar asked Mar 27 '17 01:03

KevinOelen


People also ask

Can SVM predict probability?

SVMs don't output probabilities natively, but probability calibration methods can be used to convert the output to class probabilities. Various methods exist, including Platt scaling (particularly suitable for SVMs) and isotonic regression.

What is the difference between predict () and predict_proba () in Scikit learn?

The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).

Does XGBoost predict probability?

All the most popular machine learning libraries in Python have a method called «predict_proba»: Scikit-learn (e.g. LogisticRegression, SVC, RandomForest, …), XGBoost, LightGBM, CatBoost, Keras… But, despite its name, «predict_proba» does not quite predict probabilities.

How does Sklearn predict_proba work?

The predict_proba() returns the number of votes for each class, divided by the number of trees in the forest. Your precision is exactly 1/n_estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive. You normally don't want more than 100 estimators.


2 Answers

Use: SVM(probability=True)

or

grid_svm = GridSearchCV(
    probability=True
    pipeline_svm,
    param_grid=param_svm,
    refit=True,
    n_jobs=-1, 
    scoring='accuracy',
    cv=StratifiedKFold(label_train, n_folds=5),)
like image 158
Neofytos Neocleous Avatar answered Oct 08 '22 07:10

Neofytos Neocleous


Adding (probability=True) while initializing the classifier as someone above suggested, resolved my error:

clf = SVC(kernel='rbf', C=1e9, gamma=1e-07, probability=True).fit(xtrain,ytrain)
like image 32
Dinesh Marimuthu Avatar answered Oct 08 '22 07:10

Dinesh Marimuthu