Setting a threshold in classifier output in Python

Question

Assuming I have a trained SVM classifier in Python with the flag "Probability=True" as:

classifier = SVC(C = 1000000, gamma = 10, probability=True) 
classifier.fit(my_data, the_labels)

When I perform classification of new data, I want to keep only the classified new data whom probabilities are higher than a threshold, let's say 0.90. How can i do that? Till now I am doing something like this but I am stuck:

labels_predicted = classifier.predict(new_data)
probabilities = classifier.predict_proba(new_data)

The first command returns the actual labels and the second returns the probability of its label. So, for every data point, I have its maximum likelihood label and all its related probabilities belonging to all the labels. But the maximum likelihood label maybe 0.4 which i don't want it. How can i keep only the labels with a certain threshold?

Sudeep Juvekar · Accepted Answer

As far as I know, SVC itself does not allow thresholding of probabilities in the manner you want. You can do a second pass of indexing and get the accepted labels after you build labels_predicted and probabilities.

thresh = 0.9
accepted_probabilities_idx = probabilities.max(axis=1) > thresh
accepted_labels_predicted = labels_predicted[accepted_probabilities_idx]
accepted_new_data = pandas.DataFrame(new_data, index=accepted_probabilities_idx)

I am not sure what you want to do with data where ML-probability with low. This solution discards it completely.

Setting a threshold in classifier output in Python

Tags:

python

classification

azal

1 Answers

Sudeep Juvekar

Recent Activity

Donate For Us

Setting a threshold in classifier output in Python

Tags:

python

classification

azal

1 Answers

Sudeep Juvekar

Related questions

Recent Activity

Donate For Us