Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural Network Ordinal Classification for Age

I have created a simple neural network (Python, Theano) to estimate a persons age based on their spending history from a selection of different stores. Unfortunately, it is not particularly accurate.

The accuracy might be hurt by the fact that the network has no knowledge of ordinality. For the network there is no relationship between the age classifications. It is currently selecting the age with the highest probability from the softmax output layer.

I have considered changing the output classification to an average of the weighted probability for each age.

E.g Given age probabilities: (Age 10 : 20%, Age 20 : 20%, Age 30: 60%)

Rather than output: Age 30 (Highest probability)
Weighted Average: Age 24 (10*0.2+20*0.2+30*0.6 weighted average)

This solution feels sub optimal. Is there a better was to implement ordinal classification in neural networks, or is there a better machine learning method that can be implemented? (E.g logistic regression)

like image 231
A. Dev Avatar asked Jul 14 '16 13:07

A. Dev


People also ask

What is ordinal classification?

Ordinal classification is a form of multiclass classification for which there is an inherent order between the classes, but not a meaningful numeric difference between them. The performance of such classifiers is usually assessed by measures appropriate for nominal classes or for regression.

How do you do ordinal regression in Pytorch?

To perform ordinal regression, we need to expand the targets list into a [batch_size, num_labels] tensor, according to our previous encoding, and return the mean squared error loss between predictions and the expanded target: Code for the loss function, which first encodes the target label and then calculates MSE.

How do you do regression in CNN?

Implementing a CNN for regression prediction is as simple as: Removing the fully-connected softmax classifier layer typically used for classification. Replacing it a fully-connected layer with a single node along with a linear activation function.


1 Answers

This problem came up in a previous Kaggle competition (this thread references the paper I mentioned in the comments).

The idea is that, say you had 5 age groups, where 0 < 1 < 2 < 3 < 4, instead of one-hot encoding them and using a softmax objective function, you can encode them into K-1 classes and use a sigmoid objective. So, as an example, your encodings would be

[0] -> [0, 0, 0, 0]
[1] -> [1, 0, 0, 0]
[2] -> [1, 1, 0, 0]  
[3] -> [1, 1, 1, 0]
[4] -> [1, 1, 1, 1]

Then the net will learn the orderings.

like image 104
o-90 Avatar answered Sep 20 '22 15:09

o-90