Is it possible to define class weights for a K-nearest neighbour classifier in SKLearn? I have looked at the API but cannot work it out. I have a knn problem which has very imbalanced numbers of classes (10000 of some, to 1 of others).
The original knn in sklearn does not seem to offer that option. You can alter the source code though by adding coefficients (weights) to the distance equation such that the distance is amplified for records belonging to the majority class (e.g., with a coefficient of 1.5).
https://github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/neighbors/classification.py#L23
Alternatively, the imbalanced-learn module, which is part of scikit-learn-contrib projects, can be used for data sets with high between-class imbalance:
http://contrib.scikit-learn.org/imbalanced-learn/stable/introduction.html
(in case of binary classification, you may alternatively treat the problem as an unsupervised outlier detection problem, and use methods like one-class SVM in sklearn to perform the classification)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With