I would like to train a encoder decoder model as configured below for a translation task. Could someone guide me as to how I can set-up a training pipeline for such a model? Any links or code snippets would be appreciated to understand.
from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
# Initializing a BERT bert-base-uncased style configuration
config_encoder = BertConfig()
config_decoder = BertConfig()
config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
# Initializing a Bert2Bert model from the bert-base-uncased style configurations
model = EncoderDecoderModel(config=config)
Encoder-Decoder LSTM for Sequence PredictionThe first step is to configure the problem. Next, we must define the models and compile the training model. Next, we can generate a training dataset of 100,000 examples and train the model. Once the model is trained, we can evaluate it.
The encoder is at the feeding end; it understands the sequence and reduces the dimension of the input sequence. The sequence has a fixed size known as the context vector. This context vector acts like input to the decoder, which generates an output sequence when reaching the end token.
You can use the same tokenizer for all of the various BERT models that hugging face provides. As BERT can only accept/take as input only 512 tokens at a time, we must specify the truncation parameter to True. The add special tokens parameter is just for BERT to add tokens like the start, end, [SEP], and [CLS] tokens.
Encoder-Decoder Architecture The transformer uses an encoder-decoder architecture. The encoder extracts features from an input sentence, and the decoder uses the features to produce an output sentence (translation).
The encoder-decoder models are used in the same as any other models in Transformers. It accepts batches of tokenized text as vocabulary indices (i.e., you need a tokenizer that is suitable for your sequence-to-sequence task). When you feed the model with the input (input_ids
) and the desired output (decoder_input_ids
and labels
), you will get the loss value that you can optimize during training. Note that if the sentences in the batch have different lengths, you need to do masking too. This is a minimum example for the EncoderDecoderModel
documentation:
from transformers import EncoderDecoderModel, BertTokenizer
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
'bert-base-uncased', 'bert-base-uncased')
input_ids = torch.tensor(
tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
outputs = model(
input_ids=input_ids, decoder_input_ids=input_ids, labels=input_ids,
return_dict=True)
loss = outputs.loss
If you do not want to write the training loop yourself, you can use dataset processing (DataCollatorForSeq2Seq
) and training (Seq2SeqTrainer
) utilities from Transformers. You can follow the Seq2Seq example on GitHub.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With