I have a number of classes and corresponding feature vectors, and when I run predict_proba() I will get this:
classes = ['one','two','three','one','three'] feature = [[0,1,1,0],[0,1,0,1],[1,1,0,0],[0,0,0,0],[0,1,1,1]] from sklearn.naive_bayes import BernoulliNB clf = BernoulliNB() clf.fit(feature,classes) clf.predict_proba([0,1,1,0]) >> array([[ 0.48247836, 0.40709111, 0.11043053]])
I would like to get what probability that corresponds to what class. On this page it says that they are ordered by arithmetical order, i'm not 100% sure of what that means: http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.predict_proba
Does it mean that I have go trough my training examples assign the corresponding index to the first encounter of a class, or is there a command like
clf.getClasses() = ['one','two','three']?
The predict_proba() method In the context of classification tasks, some sklearn estimators also implement the predict_proba method that returns the class probabilities for each data point.
model. predict_proba() : For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.
The predict_proba() returns the number of votes for each class, divided by the number of trees in the forest. Your precision is exactly 1/n_estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive. You normally don't want more than 100 estimators.
predict_proba(X)[source] Predict class probabilities for X. The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
Just use the .classes_
attribute of the classifier to recover the mapping. In your example that gives:
>>> clf.classes_ array(['one', 'three', 'two'], dtype='|S5')
And thanks for putting a minimalistic reproduction script in your question, it makes answering really easy by just copy and pasting in a IPython shell :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With