Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a multi-headed model? And what exactly is a 'head' in a model?

What is a multi-headed model in deep learning?

The only explanation I found so far is this: Every model might be thought of as a backbone plus a head, and if you pre-train backbone and put a random head, you can fine tune it and it is a good idea
Can someone please provide a more detailed explanation.

like image 488
spacer.34 Avatar asked May 06 '19 11:05

spacer.34


People also ask

What is the head of a model?

Head is the top of a network. For instance, on the bottom (where data comes in) you take convolution layers of some model, say resnet.

What is multi-head?

Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension.

What is multi-head CNN?

The Multi-head convolution is a CNN where each time series is processed on a fully independent convolution, so-called convolutional heads. It is responsible for extracting meaningful features from sensor data.

Why does multi-head attention work?

Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications. The popular belief is that this effectiveness stems from the ability of jointly attending multiple positions.


1 Answers

The explanation you found is accurate. Depending on what you want to predict on your data you require an adequate backbone network and a certain amount of prediction heads.

For a basic classification network for example you can view ResNet, AlexNet, VGGNet, Inception,... as the backbone and the fully connected layer as the sole prediction head.

A good example for a problem where you need multiple-heads is localization, where you not only want to classify what is in the image but also want to localize the object (find the coordinates of the bounding box around it).

The image below shows the general architecture enter image description here

The backbone network ("convolution and pooling") is responsible for extracting a feature map from the image that contains higher level summarized information. Each head uses this feature map as input to predict its desired outcome.

The loss that you optimize for during training is usually a weighted sum of the individual losses for each prediction head.

like image 56
SaiBot Avatar answered Sep 23 '22 16:09

SaiBot