For a classification task, I am using voting classifier to ensemble logistic regression and SVM with voting parameter set to soft
. The result is clearly better than each individual model. I am not sure if I understand how it works though. How can the model find the majority vote between only two models?
Assuming you have two classes class-A
and class-B
Logistic Regression
( has an inbuilt predict_proba() method) and SVC
(set probability=True) both are able to estimate class probabilities on their outputs i.e. they predict if input is class-A with probability a and class-B with probability b. If a>b then it outputs predicted class is A otherwise B .In a voting classifier setting the voting parameter to soft
enables them(SVM and LogiReg) to calculate their probability(also known as confidence score) individually and present it to the voting classifier, then the voting classifier
averages them and outputs the class with the highest probability.
Make sure that if you set voting=soft
then the classifiers you provide can also calculate this confidence score.
To see the confidence of each classifier you can do:
from sklearn.metrics import accuracy_score
y_pred=classifer_name.predict(X_test) #classifier_name=trained SVM/LogiReg/VotingClassifier
print(classifier_name.__class__.__name__,accuracy_score(y_true,y_pred))
NOTE: a+b may not appear to be 1 due to computer floating point round off. But it is 1. I can't say about other confidence scores like decision functions, but with predict_proba() it is the case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With