How is teacher-forcing implemented for the Transformer training?

Question

In this part of Tensorflow's tutorial here, they mentioned that they are training with teacher-forcing. To my knowledge, teacher-forcing involves feeding the target output into the model so that it converges faster. So I'm curious as to how this is done here? The real target is tar_real, and as far as I can see, it is only used to calculate loss and accuracy. I'm curious as to how this code is implementing teacher-forcing?

Thanks in advance.

Sarath R Nair · Accepted Answer

Each train_step takes in inp and tar objects from the dataset in the training loop. Teacher forcing is indeed used since the correct example from the dataset is always used as input during training (as opposed to the "incorrect" output from the previous training step):

tar is split into tar_inp, tar_real (offset by one character)
inp, tar_inp is used as input to the model
model produces an output which is compared with tar_real to calculate loss
model output is discarded (not used anymore)
repeat loop

Teacher forcing is a procedure ... in which during training the model receives the ground truth output y(t) as input at time t+1. Page 372, Deep Learning, 2016.

Source: https://github.com/tensorflow/tensorflow/issues/30852#issuecomment-513528114

How is teacher-forcing implemented for the Transformer training?

Tags:

Tony

1 Answers

Sarath R Nair

Recent Activity

Donate For Us

How is teacher-forcing implemented for the Transformer training?

Tags:

Tony

1 Answers

Sarath R Nair

Related questions

Recent Activity

Donate For Us