Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the corresponding class in clf.predict_proba()

I have a number of classes and corresponding feature vectors, and when I run predict_proba() I will get this:

classes = ['one','two','three','one','three']  feature = [[0,1,1,0],[0,1,0,1],[1,1,0,0],[0,0,0,0],[0,1,1,1]]  from sklearn.naive_bayes import BernoulliNB  clf = BernoulliNB() clf.fit(feature,classes) clf.predict_proba([0,1,1,0]) >> array([[ 0.48247836,  0.40709111,  0.11043053]]) 

I would like to get what probability that corresponds to what class. On this page it says that they are ordered by arithmetical order, i'm not 100% sure of what that means: http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.predict_proba

Does it mean that I have go trough my training examples assign the corresponding index to the first encounter of a class, or is there a command like

clf.getClasses() = ['one','two','three']?

like image 310
user1506145 Avatar asked May 31 '13 13:05

user1506145


People also ask

What does predict_proba () return?

The predict_proba() method In the context of classification tasks, some sklearn estimators also implement the predict_proba method that returns the class probabilities for each data point.

What does model predict_proba () do in sklearn?

model. predict_proba() : For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.

How is predict_proba calculated?

The predict_proba() returns the number of votes for each class, divided by the number of trees in the forest. Your precision is exactly 1/n_estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive. You normally don't want more than 100 estimators.

What is predict_proba in random forest?

predict_proba(X)[source] Predict class probabilities for X. The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.


1 Answers

Just use the .classes_ attribute of the classifier to recover the mapping. In your example that gives:

>>> clf.classes_ array(['one', 'three', 'two'],        dtype='|S5') 

And thanks for putting a minimalistic reproduction script in your question, it makes answering really easy by just copy and pasting in a IPython shell :)

like image 68
ogrisel Avatar answered Sep 22 '22 06:09

ogrisel