I was trying the hugging face gpt2 model. I have seen the <code>run_generation.py</code> script, which generates a sequence of tokens given a prompt. I am aware that we can use GPT2 for NLG. In my use case, I wish to determine the probability distribution for (only) the immediate next word following the given prompt. Ideally this distribution would be over the entire vocab. For example, given the prompt: "How are ", it should give a probability distribution where "you" or "they" have the some high floating point values and other vocab words have very low floating values. How to do this using hugging face transformers? If it is not possible in hugging face, is there any other transformer model that does this?

You can have a look at how the generation script works with the probabilities. <code>GPT2LMHeadModel</code> (as well as other "MLHead"-models) returns a tensor that contains for each input the unnormalized probability of what the next token might be. I.e., the last output of the model is the normalized probability of the next token (assuming <code>input_ids</code> is a tensor with token indices from the tokenizer): <pre class="prettyprint lang-py prettyprint-override"><code>outputs = model(input_ids) next_token_logits = outputs[0][:, -1, :] </code></pre> You get the distribution by normalizing the logits using softmax. The indices in the first dimension of the <code>next_token_logits</code> correspond to indices in the vocabulary that you get from the tokenizer object. Selecting the last logits becomes tricky when you use a batch size bigger than 1 and sequences of different lengths. In that case, you would need to specify <code>attention_mask</code> in the model call to mask out padding tokens and then select the last logits using <code>torch.index_select</code>. It is much easier either to use batch size 1 or batch of equally long sequences. You can use any autoregressive model in Transformers: there is distilGPT-2 (a distilled version of GPT-2), CTRL (which is basically GPT-2 trained with some additional "commands"), the original GPT (under the name <code>openai-gpt</code>), XLNet (designed for contextual embeddings, but can be used for generation in arbitrary order). There are probably more, you can Hugging Face Model Hub.

How to get immediate next word probability using GPT2 model?

Video Answer

1 Answers

You can have a look at how the generation script works with the probabilities.

GPT2LMHeadModel (as well as other "MLHead"-models) returns a tensor that contains for each input the unnormalized probability of what the next token might be. I.e., the last output of the model is the normalized probability of the next token (assuming input_ids is a tensor with token indices from the tokenizer):

outputs = model(input_ids)
next_token_logits = outputs[0][:, -1, :]

You get the distribution by normalizing the logits using softmax. The indices in the first dimension of the next_token_logits correspond to indices in the vocabulary that you get from the tokenizer object.

Selecting the last logits becomes tricky when you use a batch size bigger than 1 and sequences of different lengths. In that case, you would need to specify attention_mask in the model call to mask out padding tokens and then select the last logits using torch.index_select. It is much easier either to use batch size 1 or batch of equally long sequences.

You can use any autoregressive model in Transformers: there is distilGPT-2 (a distilled version of GPT-2), CTRL (which is basically GPT-2 trained with some additional "commands"), the original GPT (under the name openai-gpt), XLNet (designed for contextual embeddings, but can be used for generation in arbitrary order). There are probably more, you can Hugging Face Model Hub.

answered Oct 24 '22 18:10

Jindřich

Related questions
                            
                                Unable to evaluate expression in XPath
                            
                                Why do Transformers in Natural Language Processing need a stack of encoders?
                            
                                Spring Data Transformers.aliasToBean(AgentRecordDTO.class)
                            
                                NotImplementedError: Learning rate schedule must override get_config
                            
                                How do I use BertForMaskedLM or BertModel to calculate perplexity of a sentence?
                            
                                How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?
                            
                                BERT output not deterministic
                            
                                PyTorch nn.Transformer learns to copy target
                            
                                How can I do a seq2seq task with PyTorch Transformers if I am not trying to be autoregressive?
                            
                                Models passed to `fit` can only have `training` and the first argument in `call` as positional arguments, found
                            
                                Trying to remove data in Fractal by implementing ArraySerializer in Laravel 5.2
                            
                                How to use Transformers for text classification?
                            
                                How to use the PyTorch Transformer with multi-dimensional sequence-to-seqence?
                            
                                Get probability of multi-token word in MASK position

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get immediate next word probability using GPT2 model?

Tags:

huggingface-transformers

transformer

Gaurang Tandon

People also ask

Video Answer

1 Answers

Jindřich

Recent Activity

Donate For Us