Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I train a encoder-decoder model for a translation task using hugging face transformers?

I would like to train a encoder decoder model as configured below for a translation task. Could someone guide me as to how I can set-up a training pipeline for such a model? Any links or code snippets would be appreciated to understand.

from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel

# Initializing a BERT bert-base-uncased style configuration
config_encoder = BertConfig()
config_decoder = BertConfig()

config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)

# Initializing a Bert2Bert model from the bert-base-uncased style configurations
model = EncoderDecoderModel(config=config)
like image 721
Mitesh Mutha Avatar asked Jun 18 '20 09:06

Mitesh Mutha


People also ask

What are the steps to design an encoder decoder?

Encoder-Decoder LSTM for Sequence PredictionThe first step is to configure the problem. Next, we must define the models and compile the training model. Next, we can generate a training dataset of 100,000 examples and train the model. Once the model is trained, we can evaluate it.

What is encoder and decoder in machine translation?

The encoder is at the feeding end; it understands the sequence and reduces the dimension of the input sequence. The sequence has a fixed size known as the context vector. This context vector acts like input to the decoder, which generates an output sequence when reaching the end token.

How do you use the hugging face BERT model?

You can use the same tokenizer for all of the various BERT models that hugging face provides. As BERT can only accept/take as input only 512 tokens at a time, we must specify the truncation parameter to True. The add special tokens parameter is just for BERT to add tokens like the start, end, [SEP], and [CLS] tokens.

What is the difference between Transformer encoder and decoder?

Encoder-Decoder Architecture The transformer uses an encoder-decoder architecture. The encoder extracts features from an input sentence, and the decoder uses the features to produce an output sentence (translation).


1 Answers

The encoder-decoder models are used in the same as any other models in Transformers. It accepts batches of tokenized text as vocabulary indices (i.e., you need a tokenizer that is suitable for your sequence-to-sequence task). When you feed the model with the input (input_ids) and the desired output (decoder_input_ids and labels), you will get the loss value that you can optimize during training. Note that if the sentences in the batch have different lengths, you need to do masking too. This is a minimum example for the EncoderDecoderModel documentation:

from transformers import EncoderDecoderModel, BertTokenizer
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
    'bert-base-uncased', 'bert-base-uncased')
input_ids = torch.tensor(
    tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
outputs = model(
    input_ids=input_ids, decoder_input_ids=input_ids, labels=input_ids, 
    return_dict=True)
loss = outputs.loss

If you do not want to write the training loop yourself, you can use dataset processing (DataCollatorForSeq2Seq) and training (Seq2SeqTrainer) utilities from Transformers. You can follow the Seq2Seq example on GitHub.

like image 158
Jindřich Avatar answered Oct 08 '22 16:10

Jindřich