Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SMOTE - multiclass

Tags:

r

I am applying SMOTE (DMwR package) given that I have a class imbalance problem. However, I have three class outcomes instead of two.

The function correctly oversamples the minority class but I am not following the behavior for the majority/ middle class (i.e., all categories contain different sample sizes).

Let's say:

library(DMwR)

set.seed(1234)

train = data.frame(group=as.factor(rep(c(1,2,3),c(35,110,220))),
            score=rnorm(365,100))

train_resample <- SMOTE(group ~ ., train, perc.over = 400, perc.under=200)

table(train_resample$group)

#  1   2   3 
# 175  104 176

The minority class makes sense, 35+(35*4) = 175. Also, the remaining sample is clear, 140*200/100 = 280. However, I am not sure how this sample is distributed over the remaining classes. It retains the sample size order but it might be random.

Any ideas?

like image 523
timfaber Avatar asked Mar 09 '17 08:03

timfaber


1 Answers

you can try SmoteClassif() function in UBL package. The function allows you to specify the percentage by which you want to undersample or oversample each of the class.

like image 165
Suma S N Avatar answered Oct 06 '22 00:10

Suma S N