Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are models such as BERT or GPT-3 considered unsupervised learning during pre-training when there is an output (label)

I am not very experienced with unsupervised learning, but my general understanding is that in unsupervised learning, the model learns without there being an output. However, during pre-training in models such as BERT or GPT-3, it seems to me that there is an output. For example, in BERT, some of the tokens in the input sequence are masked. Then, the model will try to predict those words. Since we already know what those masked words originally were, we can compare that with the prediction to find the loss. Isn't this basically supervised learning?

like image 830
danielkim9 Avatar asked Sep 17 '25 02:09

danielkim9


1 Answers

Pre-trained language models typically leverage learning objectives that can be derived from the structure of the training data. As pointed out by @solitone, techniques like masked/causal language modelling allow the model to get supervision signals from the pre-training data, which greatly contributes to the success of PLMs and LLMs.

Another more acute terminology often referred to in the literature would be self-supervised learning as unsupervised learning traditionally refers to techniques like clustering.

like image 83
Martin Weyssow Avatar answered Sep 19 '25 18:09

Martin Weyssow