Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can i know probability of class predicted by predict() function in Support Vector Machine?

How can i know sample's probability that it belongs to a class predicted by predict() function of Scikit-Learn in Support Vector Machine?

>>>print clf.predict([fv])
[5]

There is any function?

like image 663
postgres Avatar asked Feb 22 '13 02:02

postgres


People also ask

How do you predict probability in SVM?

One standard way to obtain a “probability” out of an SVM is to use Platt scaling, which is available in many decent SVM implementations. In the binary case, the probabilities are calibrated using Platt scaling: logistic regression on the SVM's scores, fit by an additional cross-validation on the training data.

How do you find probability of prediction?

Theoretical probability uses math to predict the outcomes. Just divide the favorable outcomes by the possible outcomes. Experimental probability is based on observing a trial or experiment, counting the favorable outcomes, and dividing it by the total number of times the trial was performed.

How does SVM predict label?

The trained SVM model can either be full or compact. [ label , score ] = predict( SVMModel , X ) also returns a matrix of scores ( score ) indicating the likelihood that a label comes from a particular class. For SVM, likelihood measures are either classification scores or class posterior probabilities.

Is SVM a predictive algorithm?

The results show that, besides the individual schemes, the SVM can be used to predict the data after training the learning samples, and it is necessary to use the particle swarm optimization algorithm to optimize the parameters of the support vector machine.


4 Answers

Definitely read this section of the docs as there's some subtleties involved. See also Scikit-learn predict_proba gives wrong answers

Basically, if you have a multi-class problem with plenty of data predict_proba as suggested earlier works well. Otherwise, you may have to make do with an ordering that doesn't yield probability scores from decision_function.

Here's a nice motif for using predict_proba to get a dictionary or list of class vs probability:

model = svm.SVC(probability=True)
model.fit(X, Y)
results = model.predict_proba(test_data)[0]

# gets a dictionary of {'class_name': probability}
prob_per_class_dictionary = dict(zip(model.classes_, results))

# gets a list of ['most_probable_class', 'second_most_probable_class', ..., 'least_class']
results_ordered_by_probability = map(lambda x: x[0], sorted(zip(model.classes_, results), key=lambda x: x[1], reverse=True))
like image 153
Alex Avatar answered Oct 23 '22 20:10

Alex


Use clf.predict_proba([fv]) to obtain a list with predicted probabilities per class. However, this function is not available for all classifiers.

Regarding your comment, consider the following:

>> prob = [ 0.01357713, 0.00662571, 0.00782155, 0.3841413, 0.07487401, 0.09861277, 0.00644468, 0.40790285]
>> sum(prob)
1.0

The probabilities sum to 1.0, so multiply by 100 to get percentage.

like image 34
Bastiaan van den Berg Avatar answered Oct 23 '22 19:10

Bastiaan van den Berg


For clearer answers, I post again the information from scikit-learn for svm.

Needless to say, the cross-validation involved in Platt scaling is an expensive operation for large datasets. In addition, the probability estimates may be inconsistent with the scores, in the sense that the “argmax” of the scores may not be the argmax of the probabilities. (E.g., in binary classification, a sample may be labeled by predict as belonging to a class that has probability <½ according to predict_proba.) Platt’s method is also known to have theoretical issues. If confidence scores are required, but these do not have to be probabilities, then it is advisable to set probability=False and use decision_function instead of predict_proba.

For other classifiers such as Random Forest, AdaBoost, Gradient Boosting, it should be okay to use predict function in scikit-learn.

like image 2
beahacker Avatar answered Oct 23 '22 19:10

beahacker


When creating SVC class to compute the probability estimates by setting probability=True:

http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Then call fit as usual and then predict_proba([fv]).

like image 14
ogrisel Avatar answered Oct 23 '22 20:10

ogrisel