I think I do not understand the multiple-output networks.
Althrough i understand how the implementation is made and i succesfully trained one model like this, i don't understand how a multiple-outputs deep learning network is trained. I mean, what is happening inside the network during training?
Take for example this network from the keras functional api guide:
You can see the two outputs (aux_output and main_output). How is the backpropagation working?
My intuition was that the model does two backpropagations, one for each output. Each backpropagation then updates the weight of the layers preceding the exit. But it appears that's not true: from here (SO), i got the information that there is only one backpropagation despite the multiple outputs; the used loss is weighted according to the outputs.
But still, i don't get how the network and its auxiliary branch are trained; how are the auxiliary branch weights updated as it is not connected directly to the main output? Is the part of the network which is between the root of the auxiliary branch and the main output concerned by the the weighting of the loss? Or the weighting influences only the part of the network that is connected to the auxiliary output?
Also, i'm looking for good articles about this subject. I already read GoogLeNet / Inception articles (v1,v2-v3) as this network is using auxiliary branches.
Keras calculations are graph based and use only one optimizer.
The optimizer is also a part of the graph, and in its calculations it gets the gradients of the whole group of weights. (Not two groups of gradients, one for each output, but one group of gradients for the entire model).
Mathematically, it's not really complicated, you have a final loss function made of:
loss = (main_weight * main_loss) + (aux_weight * aux_loss) #you choose the weights in model.compile
All defined by you. Plus a series of other possible weights (sample weights, class weights, regularizer terms, etc.)
Where:
main_loss
is a function_of(main_true_output_data, main_model_output)
aux_loss
is a function_of(aux_true_output_data, aux_model_output)
And the gradients are just ∂(loss)/∂(weight_i)
for all weights.
Once the optimizer has the gradients, it performs its optimization step once.
Questions:
how are the auxiliary branch weights updated as it is not connected directly to the main output?
main_output
and another dataset for aux_output
. You must pass them to fit
in model.fit(inputs, [main_y, aux_y], ...)
main_loss
takes main_y
and main_out
; and aux_loss
takex aux_y
and aux_out
. loss = (main_weight * main_loss) + (aux_weight * aux_loss)
loss
once, and this function connects to the entire model.
aux
term will affect lstm_1
and embedding_1
in backpropagation. Is the part of the network which is between the root of the auxiliary branch and the main output concerned by the the weighting of the loss? Or the weighting influences only the part of the network that is connected to the auxiliary output?
The weights are plain mathematics. You will define them in compile
:
model.compile(optimizer=one_optimizer,
#you choose each loss
loss={'main_output':main_loss, 'aux_output':aux_loss},
#you choose each weight
loss_weights={'main_output': main_weight, 'aux_output': aux_weight},
metrics = ...)
And the loss function will use them in loss = (weight1 * loss1) + (weight2 * loss2)
.
The rest is the mathematical calculation of ∂(loss)/∂(weight_i)
for each weight.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With