I have created a simple neural network (Python, Theano) to estimate a persons age based on their spending history from a selection of different stores. Unfortunately, it is not particularly accurate.
The accuracy might be hurt by the fact that the network has no knowledge of ordinality. For the network there is no relationship between the age classifications. It is currently selecting the age with the highest probability from the softmax output layer.
I have considered changing the output classification to an average of the weighted probability for each age.
E.g Given age probabilities: (Age 10 : 20%, Age 20 : 20%, Age 30: 60%)
Rather than output: Age 30 (Highest probability)
Weighted Average: Age 24 (10*0.2+20*0.2+30*0.6 weighted average)
This solution feels sub optimal. Is there a better was to implement ordinal classification in neural networks, or is there a better machine learning method that can be implemented? (E.g logistic regression)
Ordinal classification is a form of multiclass classification for which there is an inherent order between the classes, but not a meaningful numeric difference between them. The performance of such classifiers is usually assessed by measures appropriate for nominal classes or for regression.
To perform ordinal regression, we need to expand the targets list into a [batch_size, num_labels] tensor, according to our previous encoding, and return the mean squared error loss between predictions and the expanded target: Code for the loss function, which first encodes the target label and then calculates MSE.
Implementing a CNN for regression prediction is as simple as: Removing the fully-connected softmax classifier layer typically used for classification. Replacing it a fully-connected layer with a single node along with a linear activation function.
This problem came up in a previous Kaggle competition (this thread references the paper I mentioned in the comments).
The idea is that, say you had 5 age groups, where 0 < 1 < 2 < 3 < 4, instead of one-hot encoding them and using a softmax objective function, you can encode them into K-1 classes and use a sigmoid objective. So, as an example, your encodings would be
[0] -> [0, 0, 0, 0]
[1] -> [1, 0, 0, 0]
[2] -> [1, 1, 0, 0]
[3] -> [1, 1, 1, 0]
[4] -> [1, 1, 1, 1]
Then the net will learn the orderings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With