Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the probability of a particular token(word) in a sentence given the context

I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get satisfactory results due to the model's unidirectional nature which for me didn't seem to predict within context. So I was wondering whether there is a way, to calculate the above said using BERT since it's Bidirectional.

I've found this post relatable, which I randomly saw the other day but didn't see any answer which would be useful for me as well.

Hope I will be able to receive ideas or a solution for this. Any help is appreciated. Thank you.

like image 432
Dilrukshi Perera Avatar asked May 14 '20 01:05

Dilrukshi Perera


People also ask

What does the probability of each word depend on?

The probability of each word depends on the n-1 words before it. For a trigram model (n = 3), for example, each word’s probability depends on the 2 words immediately before it. This probability is estimated as the fraction of times this n-gram appears among all the previous (n-1)-grams in the training set.

How to increase the probability of each word in evaluation text?

The better our n-gram model is, the probability that it assigns to each word in the evaluation text will be higher on average. 1. We build a NgramCounter class that takes in a tokenized text file and stores the counts of all n-grams in the that text.

How do you calculate the probability of a complete word sequence?

The probability of a complete word sequence is calculated using the chain rule of probability. = P (I | eos) * P (do | I) * P (not | do) * P (like | not) * P (green | like) * P (eggs | green) * P (and | eggs) * P (ham | and) * P (eos | ham)

How do you find the probability of an event?

How to Calculate Probability Converting Odds to Probabilities 1 Set the odds as a ratio with the positive outcome as a numerator. 2 Add the numbers together to convert the odds to probability. 3 Find the odds as if you were calculating the probability of a single event. See More....


Video Answer


1 Answers

BERT is trained as a masked language model, i.e., it is trained to predict tokens that were replaced by a [MASK] token.

from transformers import AutoTokenizer, BertForMaskedLM

tok = AutoTokenizer.from_pretrained("bert-base-cased")
bert = BertForMaskedLM.from_pretrained("bert-base-cased")

input_idx = tok.encode(f"The {tok.mask_token} were the best rock band ever.")
logits = bert(torch.tensor([input_idx]))[0]
prediction = logits[0].argmax(dim=1)
print(tok.convert_ids_to_tokens(prediction[2].numpy().tolist()))

It prints token no. 11581 which is:

Beatles

The tricky thing is that words might be split into multiple subwords. You can simulate that be adding multiple [MASK] tokens, but then you have a problem of how to reliably compare the scores of prediction so different lengths. I would probably average the probabilities, but maybe there is a better way.

like image 96
Jindřich Avatar answered Sep 24 '22 00:09

Jindřich