Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

Question

I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in models such as BERT or GPT-3, it seems to me that there is an output. For example, in BERT, some of the tokens in the input sequence are masked. Then, the model will try to predict those words. Since we already know what those masked words originally were, we can compare that with the prediction to find the loss. Isn't this basically supervised learning?

Martin Weyssow · Accepted Answer

Pre-trained language models typically leverage learning objectives that can be derived from the structure of the training data. As pointed out by @solitone, techniques like masked/causal language modelling allow the model to get supervision signals from the pre-training data, which greatly contributes to the success of PLMs and LLMs.

Another more acute terminology often referred to in the literature would be self-supervised learning as unsupervised learning traditionally refers to techniques like clustering.

Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

Tags:

machine-learning

unsupervised-learning

bert-language-model

danielkim9

1 Answers

Martin Weyssow

Recent Activity

Donate For Us

Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

Tags:

machine-learning

unsupervised-learning

bert-language-model

danielkim9

1 Answers

Martin Weyssow

Related questions

Recent Activity

Donate For Us