Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Significance of auxiliary output in Multi-input and multi-output model using deep network

I am refering to the keras documentation to build a network which takes multiple input in the form of embeddings and some other important features. But I didn't understand exact effect of auxiliary loss if we have already defined main loss.

Here we insert the auxiliary loss, allowing the LSTM and Embedding layer to be trained smoothly even though the main loss will be much higher in the model.

As mentioned in document I am assuming it helps to train smoothly on Embedding/any other layer defined before. My question is, how to decide weights for auxiliary loss.

We compile the model and assign a weight of 0.2 to the auxiliary loss. To specify different loss_weights or loss for each different output, you can use a list or a dictionary.

I will really appreciate if someone could explain about how to decide loss weights and how higher/lower value of auxiliary loss weight affect on model training and on prediction.

like image 992
Nilesh Birari Avatar asked Apr 04 '17 19:04

Nilesh Birari


People also ask

What is auxiliary network?

The Aux-Net model is based on the hedging algorithm and online gradient descent. It employs a model of varying depth in an online setting using single pass learning. Aux-Net is a foundational work towards scalable neural network model for a dynamic complex environment requiring ad hoc or inconsistent input data.

Can neural networks have multiple outputs?

Neural Networks for Multi-OutputsNeural network models also support multi-output regression and have the benefit of learning a continuous function that can model a more graceful relationship between changes in input and output.

What is an auxiliary loss?

The auxiliary loss proposed is to minimize the classification error of a neural network classifier that predicts whether or not a pair of states sampled from the agents current episode trajectory are in order. The classifier takes as input a pair of states as well as the agent's memory.

What is multi-output model?

Multi-output classification is a type of machine learning that predicts multiple outputs simultaneously. In multi-output classification, the model will give two or more outputs after making any prediction. In other types of classifications, the model usually predicts only a single output.


1 Answers

This is a really interesting issue. The idea of auxiliary classifiers is not so uncommon as one may think. It's used e.g. in Inception architecture. In this answer I would try to provide you a few intuitions on why this tweak might actually help in training:

  1. It helps gradient to pass down to lower layers: one may think that a loss defined for an auxiliary classifier is conceptually similiar to the main loss - because both of them measure how good our model is. Due to that we may assume that gradient w.r.t. to lower layers should be similiar for both of these loses. A vanishing gradient phenomenon is still a case - even though we have techniques like e.g. Batch Normalization - so every additional help might improve your training performance.

  2. It makes a low-level features more accurate: while we are training our network - the information about how good are model`s low-level features are and how to change them must go throught all other layers of your network. This might not only make gradient vanishing - but due to the fact that operations performed during neural-net computations might be really complexed - this could also make the information about your lower-level features irrelevant. This is really important especially in a early stage of training - when most of your features are rather random (due to random start) - and the direction to which your weights are pushed - might be semantically bizarre. This problem might be overcome by auxiliary outputs because in this setup - your lower level features are made to be meaningful from the earliest part of training.

  3. This might be considered as an intelligent regularization: you are putting a meaningful constrain on your model which might prevent overfitting, especially on small datasets.

From what I wrote above one may infer some hints about how to set the auxilliary loss weight:

  1. It's good to have it bigger at the beginning of training.
  2. It should help in passing information through your network but it also shouldn't disturb the training process. So the rule of thumb in which the deeper aux output is - the bigger loss weight is - is imho reasonable.
  3. If your dataset is not to big or training time is not so long - you may try to actually tune it using some kind of hyperparameter optimization.
  4. You should remember that your main loss is the most important - and even though aux output might help - their weight loss should be relatively smaller than a main loss weight.
like image 95
Marcin Możejko Avatar answered Oct 18 '22 21:10

Marcin Możejko