Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is teacher-forcing implemented for the Transformer training?

Tags:

In this part of Tensorflow's tutorial here, they mentioned that they are training with teacher-forcing. To my knowledge, teacher-forcing involves feeding the target output into the model so that it converges faster. So I'm curious as to how this is done here? The real target is tar_real, and as far as I can see, it is only used to calculate loss and accuracy. I'm curious as to how this code is implementing teacher-forcing?

Thanks in advance.

like image 790
Tony Avatar asked Jul 18 '19 17:07

Tony


1 Answers

Each train_step takes in inp and tar objects from the dataset in the training loop. Teacher forcing is indeed used since the correct example from the dataset is always used as input during training (as opposed to the "incorrect" output from the previous training step):

  1. tar is split into tar_inp, tar_real (offset by one character)
  2. inp, tar_inp is used as input to the model
  3. model produces an output which is compared with tar_real to calculate loss
  4. model output is discarded (not used anymore)
  5. repeat loop

Teacher forcing is a procedure ... in which during training the model receives the ground truth output y(t) as input at time t+1. Page 372, Deep Learning, 2016.

Source: https://github.com/tensorflow/tensorflow/issues/30852#issuecomment-513528114

like image 71
Sarath R Nair Avatar answered Nov 29 '22 03:11

Sarath R Nair