I am applying SMOTE (DMwR package) given that I have a class imbalance problem. However, I have three class outcomes instead of two.
The function correctly oversamples the minority class but I am not following the behavior for the majority/ middle class (i.e., all categories contain different sample sizes).
Let's say:
library(DMwR)
set.seed(1234)
train = data.frame(group=as.factor(rep(c(1,2,3),c(35,110,220))),
score=rnorm(365,100))
train_resample <- SMOTE(group ~ ., train, perc.over = 400, perc.under=200)
table(train_resample$group)
# 1 2 3
# 175 104 176
The minority class makes sense, 35+(35*4) = 175. Also, the remaining sample is clear, 140*200/100 = 280. However, I am not sure how this sample is distributed over the remaining classes. It retains the sample size order but it might be random.
Any ideas?
you can try SmoteClassif() function in UBL package. The function allows you to specify the percentage by which you want to undersample or oversample each of the class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With