How do I train a encoder-decoder model for a translation task using hugging face transformers?

Tags:

I would like to train a encoder decoder model as configured below for a translation task. Could someone guide me as to how I can set-up a training pipeline for such a model? Any links or code snippets would be appreciated to understand.

from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel

# Initializing a BERT bert-base-uncased style configuration
config_encoder = BertConfig()
config_decoder = BertConfig()

config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)

# Initializing a Bert2Bert model from the bert-base-uncased style configurations
model = EncoderDecoderModel(config=config)

721

asked Jun 18 '20 09:06

Mitesh Mutha

1 Answers

The encoder-decoder models are used in the same as any other models in Transformers. It accepts batches of tokenized text as vocabulary indices (i.e., you need a tokenizer that is suitable for your sequence-to-sequence task). When you feed the model with the input (input_ids) and the desired output (decoder_input_ids and labels), you will get the loss value that you can optimize during training. Note that if the sentences in the batch have different lengths, you need to do masking too. This is a minimum example for the EncoderDecoderModel documentation:

from transformers import EncoderDecoderModel, BertTokenizer
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
    'bert-base-uncased', 'bert-base-uncased')
input_ids = torch.tensor(
    tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
outputs = model(
    input_ids=input_ids, decoder_input_ids=input_ids, labels=input_ids, 
    return_dict=True)
loss = outputs.loss

If you do not want to write the training loop yourself, you can use dataset processing (DataCollatorForSeq2Seq) and training (Seq2SeqTrainer) utilities from Transformers. You can follow the Seq2Seq example on GitHub.

158

answered Oct 08 '22 16:10

Jindřich

Related questions
                            
                                Where is perplexity calculated in the Huggingface gpt2 language model code?
                            
                                How to get intermediate layers' output of pre-trained BERT model in HuggingFace Transformers library?
                            
                                how to convert HuggingFace's Seq2seq models to onnx format
                            
                                Early stopping in Bert Trainer instances
                            
                                BERT sentence embeddings from transformers
                            
                                Text generation using huggingface's distilbert models
                            
                                How to predict the probability of an empty string using BERT
                            
                                How to use the past with HuggingFace Transformers GPT-2?
                            
                                What are the inputs to the transformer encoder and decoder in BERT?
                            
                                How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?
                            
                                How to fine tune BERT on unlabeled data?
                            
                                Downloading transformers models to use offline
                            
                                How exactly should the input file be formatted for the language model finetuning (BERT through Huggingface Transformers)?
                            
                                Save only best weights with huggingface transformers
                            
                                BERT tokenizer & model download
                            
                                Huggingface transformer model returns string instead of logits
                            
                                How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?
                            
                                Huggingface AlBert tokenizer NoneType error with Colab

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I train a encoder-decoder model for a translation task using hugging face transformers?

Tags:

encoder-decoder

machine-translation

huggingface-transformers

Mitesh Mutha

People also ask

1 Answers

Jindřich

Recent Activity

Donate For Us