Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiclass classification or regression?

I am trying to train a CNN model to classify images based on their aesthetic score. There are 2,00,000 images and every image is rated by more than 100 subjects. Mean score is calculated and the scores are normalized.

enter image description here

The distribution of the scores is approximately gaussian. So I have decided to build a 10 class classification model after assigning appropriate weight for each class as the data is imbalanced.

My question:

For this problem, the scores are continuous, ie, 0<0.2<0.3<0.4<0.5<..<1. Then does that mean this is a regression problem? If so, how do I balance the data for a regression problem, as most of the datapoints are present in between 0.4 and 0.6.

Thanks!

like image 678
AKSHAYAA VAIDYANATHAN Avatar asked Nov 07 '22 08:11

AKSHAYAA VAIDYANATHAN


1 Answers

Since your labels are continuous, you could divide them in to 10 equal quantiles using a technique like pandas.qcut() and provide label to each classes. This can turn a regression problem to a classification problem.

And as far as the imbalance is concerned, you may want to try to oversample the minority data. This will ensure your model is not biased towards majority data.

Hope this helps.

like image 159
Sagar Dawda Avatar answered Nov 25 '22 00:11

Sagar Dawda