Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does voting between two classifiers work in sklearn?

For a classification task, I am using voting classifier to ensemble logistic regression and SVM with voting parameter set to soft. The result is clearly better than each individual model. I am not sure if I understand how it works though. How can the model find the majority vote between only two models?

like image 893
Clement Attlee Avatar asked Jan 03 '23 21:01

Clement Attlee


1 Answers

Assuming you have two classes class-A and class-B

Logistic Regression( has an inbuilt predict_proba() method) and SVC(set probability=True) both are able to estimate class probabilities on their outputs i.e. they predict if input is class-A with probability a and class-B with probability b. If a>b then it outputs predicted class is A otherwise B .In a voting classifier setting the voting parameter to soft enables them(SVM and LogiReg) to calculate their probability(also known as confidence score) individually and present it to the voting classifier, then the voting classifier averages them and outputs the class with the highest probability.

Make sure that if you set voting=soft then the classifiers you provide can also calculate this confidence score.

To see the confidence of each classifier you can do:

from sklearn.metrics import accuracy_score
y_pred=classifer_name.predict(X_test) #classifier_name=trained SVM/LogiReg/VotingClassifier
print(classifier_name.__class__.__name__,accuracy_score(y_true,y_pred))

NOTE: a+b may not appear to be 1 due to computer floating point round off. But it is 1. I can't say about other confidence scores like decision functions, but with predict_proba() it is the case.

like image 191
Pratik Kumar Avatar answered Jan 05 '23 15:01

Pratik Kumar