Assuming I have a trained SVM classifier in Python with the flag "Probability=True" as:
classifier = SVC(C = 1000000, gamma = 10, probability=True)
classifier.fit(my_data, the_labels)
When I perform classification of new data, I want to keep only the classified new data whom probabilities are higher than a threshold, let's say 0.90. How can i do that? Till now I am doing something like this but I am stuck:
labels_predicted = classifier.predict(new_data)
probabilities = classifier.predict_proba(new_data)
The first command returns the actual labels and the second returns the probability of its label. So, for every data point, I have its maximum likelihood label and all its related probabilities belonging to all the labels. But the maximum likelihood label maybe 0.4 which i don't want it. How can i keep only the labels with a certain threshold?
As far as I know, SVC itself does not allow thresholding of probabilities in the manner you want. You can do a second pass of indexing and get the accepted labels after you build labels_predicted and probabilities.
thresh = 0.9
accepted_probabilities_idx = probabilities.max(axis=1) > thresh
accepted_labels_predicted = labels_predicted[accepted_probabilities_idx]
accepted_new_data = pandas.DataFrame(new_data, index=accepted_probabilities_idx)
I am not sure what you want to do with data where ML-probability with low. This solution discards it completely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With