Sklearn - How to predict probability for all target labels

Tags:

I have a data set with a target variable that can have 7 different labels. Each sample in my training set has only one label for the target variable.

For each sample, I want to calculate the probability for each of the target labels. So my prediction would consist of 7 probabilities for each row.

On the sklearn website I read about multi-label classification, but this doesn't seem to be what I want.

I tried the following code, but this only gives me one classification per sample.

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

Does anyone have some advice on this? Thanks!

980

asked Jul 15 '16 19:07

3 Answers

You can do that by simply removing the OneVsRestClassifer and using predict_proba method of the DecisionTreeClassifier. You can do the following:

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

This will give you a probability for each of your 7 possible classes.

Hope that helps!

answered Oct 18 '22 20:10

You can try using scikit-multilearn - an extension of sklearn that handles multilabel classification. If your labels are not overly correlated you can train one classifier per label and get all predictions - try (after pip install scikit-multilearn):

from skmultilearn.problem_transform import BinaryRelevance    
classifier = BinaryRelevance(classifier = DecisionTreeClassifier())

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

Predictions will contain a sparse matrix of size (n_samples, n_labels) in your case - n_labels = 7, each column contains prediction per label for all samples.

In case your labels are correlated you might need more sophisticated methods for multi-label classification.

Disclaimer: I'm the author of scikit-multilearn, feel free to ask more questions.

answered Oct 18 '22 20:10

niedakh

If you insist on using the OneVsRestClassifer, then you could also call predict_proba(X_test) as it is supported by OneVsRestClassifer as well.

For eg:

from sklearn.multiclass import OneVsRestClassifier
clf = OneVsRestClassifier(DecisionTreeClassifier())
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

The order of the labels for which you get the result can be found in:

clf.classes_

answered Oct 18 '22 18:10

SA1T

Related questions
                            
                                How to check if two instances are of the same class Python
                            
                                How to get the widget's current x and y coordinates?
                            
                                Using a checkbox in pyqt
                            
                                Python Command Line Arguments: Calling a function
                            
                                Rolling back to a previous migration in django
                            
                                Check the similarity between two words with NLTK with Python
                            
                                Show string values on x-axis in pyqtgraph
                            
                                Python: split method call into multiple lines
                            
                                Update text of submit button in wtforms
                            
                                Error while starting new scrapy project
                            
                                How to detect debug mode in jinja?
                            
                                How do you use Boto3 download_file with AWS KMS?
                            
                                How to get entire VARCHAR(MAX) column with Python pypyodbc
                            
                                Django admin /template/ folder missing after fresh install in virtualenv
                            
                                Determine if there is at least one zero in a multidimensional numpy array
                            
                                Django 1.9 JSONField update behavior
                            
                                Why Won't Google API V3 Return Children?
                            
                                How to forbid two conflicting options
                            
                                Drawing filled polygon using mouse events in open cv using python
                            
                                linear interpolation between two data points

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sklearn - How to predict probability for all target labels

Tags:

python

scikit-learn

multilabel-classification

Bert Carremans

People also ask

3 Answers

Abhinav Arora

niedakh

SA1T

Recent Activity

Donate For Us