I built sentiment analyzer using SVM classifier. I trained model with probability=True and it can give me probability. But when I pickled my model and load it again later, the probability doesn't work anymore.
The model:
from sklearn.svm import SVC, LinearSVC
pipeline_svm = Pipeline([
('bow', CountVectorizer()),
('tfidf', TfidfTransformer()),
('classifier', SVC(probability=True)),])
# pipeline parameters to automatically explore and tune
param_svm = [
{'classifier__C': [1, 10, 100, 1000], 'classifier__kernel': ['linear']},
{'classifier__C': [1, 10, 100, 1000], 'classifier__gamma': [0.001, 0.0001], 'classifier__kernel': ['rbf']},
]
grid_svm = GridSearchCV(
pipeline_svm,
param_grid=param_svm,
refit=True,
n_jobs=-1,
scoring='accuracy',
cv=StratifiedKFold(label_train, n_folds=5),)
svm_detector_reloaded = cPickle.load(open('svm_sentiment_analyzer.pkl', 'rb'))
print(svm_detector_reloaded.predict([""""Today is awesome day"""])[0])
Gives me:
AttributeError: predict_proba is not available when probability=False
SVMs don't output probabilities natively, but probability calibration methods can be used to convert the output to class probabilities. Various methods exist, including Platt scaling (particularly suitable for SVMs) and isotonic regression.
The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).
All the most popular machine learning libraries in Python have a method called «predict_proba»: Scikit-learn (e.g. LogisticRegression, SVC, RandomForest, …), XGBoost, LightGBM, CatBoost, Keras… But, despite its name, «predict_proba» does not quite predict probabilities.
The predict_proba() returns the number of votes for each class, divided by the number of trees in the forest. Your precision is exactly 1/n_estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive. You normally don't want more than 100 estimators.
Use: SVM(probability=True)
or
grid_svm = GridSearchCV(
probability=True
pipeline_svm,
param_grid=param_svm,
refit=True,
n_jobs=-1,
scoring='accuracy',
cv=StratifiedKFold(label_train, n_folds=5),)
Adding (probability=True) while initializing the classifier as someone above suggested, resolved my error:
clf = SVC(kernel='rbf', C=1e9, gamma=1e-07, probability=True).fit(xtrain,ytrain)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With