I'm trying to calculate the probability or any type of score for words in a sentence using NLP. I've tried this approach with GPT2 model using Huggingface Transformers library, but, I couldn't get satisfactory results due to the model's unidirectional nature which for me didn't seem to predict within context. So I was wondering whether there is a way, to calculate the above said using BERT since it's Bidirectional.
I've found this post relatable, which I randomly saw the other day but didn't see any answer which would be useful for me as well.
Hope I will be able to receive ideas or a solution for this. Any help is appreciated. Thank you.
The probability of each word depends on the n-1 words before it. For a trigram model (n = 3), for example, each word’s probability depends on the 2 words immediately before it. This probability is estimated as the fraction of times this n-gram appears among all the previous (n-1)-grams in the training set.
The better our n-gram model is, the probability that it assigns to each word in the evaluation text will be higher on average. 1. We build a NgramCounter class that takes in a tokenized text file and stores the counts of all n-grams in the that text.
The probability of a complete word sequence is calculated using the chain rule of probability. = P (I | eos) * P (do | I) * P (not | do) * P (like | not) * P (green | like) * P (eggs | green) * P (and | eggs) * P (ham | and) * P (eos | ham)
How to Calculate Probability Converting Odds to Probabilities 1 Set the odds as a ratio with the positive outcome as a numerator. 2 Add the numbers together to convert the odds to probability. 3 Find the odds as if you were calculating the probability of a single event. See More....
BERT is trained as a masked language model, i.e., it is trained to predict tokens that were replaced by a [MASK]
token.
from transformers import AutoTokenizer, BertForMaskedLM
tok = AutoTokenizer.from_pretrained("bert-base-cased")
bert = BertForMaskedLM.from_pretrained("bert-base-cased")
input_idx = tok.encode(f"The {tok.mask_token} were the best rock band ever.")
logits = bert(torch.tensor([input_idx]))[0]
prediction = logits[0].argmax(dim=1)
print(tok.convert_ids_to_tokens(prediction[2].numpy().tolist()))
It prints token no. 11581 which is:
Beatles
The tricky thing is that words might be split into multiple subwords. You can simulate that be adding multiple [MASK]
tokens, but then you have a problem of how to reliably compare the scores of prediction so different lengths. I would probably average the probabilities, but maybe there is a better way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With