Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling class imbalance in Passive Aggresive Online Learner in Scikit learn

I am currently working on a large scale multi class image classification problem. I am currently using an Online learning strategy making use of Passive Aggressive algorithm implementation in scikit learn. (owing to it's faster convergence compared to SGD implementation). I am following a One Vs All (OVA) approach by building N (number of classes) OVA classifiers.

To handle the large amount of training data, I basically break my dataset into stratified minibatches and run them through the Online learners (each OVA's) over several iterations, until performance on the validation batch plateaus off. (Model initialization and hyper param selection is done on the first batch) My measure is primarily MAP or mean average precision. (average over scores from sklearn.metrics.average_precision_score of each OVA model).

With this framework, everytime I have new labels available, I can create a new batch and run them through the partial fit operation, improving the model performance further.

My concern is weather this approach would be able to handle the class imbalance that would occur in the mini batches or as more mini batches are added in the future. I suspect that because of the class imbalance, the model would get biased with majority classes leading to low recall of minority classes.

One quick fix is to use class_weight='auto' during the learning but this is only supported with "SGD" implementation and not with passive aggressive implementation. ? Any reason for this given that both use the same underlying sgd implementation.

Other option i could think of is to make balanced mini - batches thus ensuring the models are not biased towards majority classes.

It would be great to have views on the architecture and it's possible drawbacks. - is MAP the right measure ? - How to handle online learning in unbalanced classes scenario. - Any other linear algorithm instead of Passive Aggressive and SGD that might better suit the problem

Thanks

like image 915
Shourabh Rawat Avatar asked Dec 20 '25 12:12

Shourabh Rawat


1 Answers

The Passive Aggressive classifier doesn't "converge" like most algorithms you are used to. Indeed, if you read the paper the point of PA is to make the update that completely corrects the loss yet causes the minimum change in the norm of the weight vector. NOTE that the regularization parameter in PA keeps it from fully correcting on a per example basis.

In this way, PA is meant specifically for online and not batch training, and thus - the running of PA on mini batches until it stabilizes is likely not helping (and maybe hurting) your generalization accuracy.

is MAP the right measure

Depends entirely upon your data and needs.

Any other linear algorithm instead of Passive Aggressive and SGD that might better suit the problem

Depends entirely upon your data and needs

One quick fix is to use class_weight='auto' during the learning but this is only supported with "SGD" implementation and not with passive aggressive implementation. ? Any reason for this given that both use the same underlying sgd implementation.

Yes, see my description of PA. PA's method of learning doesn't allow for that addition. While you could implement it by an alteration of the regularization on a per class basis, I don't think it would make any sense. See the original paper if you need to know more.

You can search for "class imbalance" to find more methods of trying to deal with that problem, but it all depends on your data.

If you are willing to use Java, JSAT has a directly multi-class implementation of PA called SPA. It may or may not be more accurate for your problem.

like image 92
Raff.Edward Avatar answered Dec 23 '25 02:12

Raff.Edward



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!