Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with class imbalance in multi-label classification

I've seen a few questions on class imbalance in a multiclass setting. However, I have a multi-label problem, so how would you deal with it in this case?

I have a set of around 300k text examples. As mentioned in the title, each example has at least one label, and there are only 100 possible unique labels. I've reduced this problem down to binary classification for Vowpal Wabbit by taking advantage of namespaces, e.g.

From:

healthy fruit | bananas oranges jack fruit
evil monkey | bipedal organism family guy
...  

To:

1 |healthy bananas oranges jack fruit
1 |fruit bananas oranges jack fruit
0 |evil bananas oranges jack fruit
0 |monkey bananas oranges jack fruit
0 |healthy bipedal organism family guy
0 |fruit bipedal organism family guy
1 |evil bipedal organism family guy
1 |monkey bipedal organism family guy
...  

I'm using the default options provided by VW (which I think is online SGD, with the squared loss function). I'm using the squared loss because it closely resembles the Hamming Loss.

After training, when testing on the same training set, I've noticed that all examples were predicted with the '0' label... which is one way of minimizing loss, I guess. At this point, I'm not sure what to do. I was thinking of using cost-sensitive one-against-all classification to try to balance the classes, but reducing multi-label to multi-class is unfeasible since there exists 2^100 label combinations. I'm wondering if anyone else have any suggestions.

Edit: I finally had the chance to test out class-imbalance, specifically for vw. vw handles imbalance very badly, at least for highly-dimensional, sparsely-populated text features. I've tried ratios from 1:1, to 1:25, with performance degrading abruptly at the 1:2 ratio.

like image 854
richizy Avatar asked Dec 09 '13 00:12

richizy


People also ask

How do you handle imbalanced dataset in multiclass classification?

Handling Imbalanced DatasetCalculate the Euclidean distance between the random data and its k nearest neighbors. Multiply the difference with a random number between 0 and 1. Then, add the result to the minority class as a synthetic sample. Repeat the procedure until the expected proportion of minority class is met.

What are the strategies to address the class imbalance problem?

Random Oversampling One method is to randomly resample from the minority classes (West and East) in our training dataset to meet the highest class-specific sample size, essentially copying random minority records. We'll use the module “imbalanced-learn” that has some useful functions for this purpose.


1 Answers

Any linear model will handle class imbalance "very badly" if you force it to use squared loss for a binary classification problem. Think about the loss function: if 99% of observations are zero, predicting 0 in all cases gives a squared error of 0.01. Vowpal Wabbit can't do magic: if you ask it to minimize squared error loss, it will indeed minimize squared error loss, as will any other regression program.

Here's a demonstration of the same "problem" with a linear regression model in R:

set.seed(42)
rows <- 10000
cols <- 100
x <- matrix(sample(0:1, rows*cols, replace=TRUE), nrow=rows)
y <- x %*% runif(cols) + runif(rows)
y <- ifelse(y<quantile(y, 0.99), 0, 1)
lin_mod <- glm(y~., data.frame(y, x), family='gaussian') #Linear model
log_mod <- glm(factor(y)~., data.frame(y, x), family='binomial') #Logistic model

Comparing predictions from a linear vs logistic model shows that the linear model always predicts 0 and the logistic model predicts the correct mix of 0's and 1's:

> table(ifelse(predict(lin_mod, type='response')>0.50, 1, 0))

    0 
10000 
> table(ifelse(predict(log_mod, type='response')>0.50, 1, 0))

   0    1 
9900  100 

Use --loss_function="logistic" or --loss_function="hinge" for binary classification problems in vowpal wabbit. You can evaluate your predictions after the fact using Hamming loss, but it may be informative to compare your results to the Hamming loss of always predicting 0.

like image 113
Zach Avatar answered Nov 07 '22 04:11

Zach