how to determine which Merge mode (add/ average/ multiply/ dot / concat) to use?

Question

After testing the script of [babi_rnn.py] and [babi_memnn.py], the question of [how to determine which Merge mode (add/ average/ multiply/ dot / concat) to use?] raised up many times in my mind.

For example, for the LSTM modeling，it seems easy to understand that using [concat] to merge let's say two-branches's time sequence layer output.

However, it is not that easy for me to understand why to use [add] to merge two branches in [babi_rnn.py]. In [babi_memnn.py], the [add],[dot] and [concat] merging modes are recruited.

So, is there any suggestions for choosing which merging function to use in different usage scenarios?

Ricky Han · Accepted Answer

These Merge functions fall into 3 categories.

add, avg are linear combinations. It is used for simply combining several distinct components together because gradient flows nicely through addition and subtraction. A common use case is adding(+) several criterion together to obtain a loss function for a neural network that trains on multiple tasks jointly.

Another example is L2 regularization:

L2 regularization aims to minimize variance in weights. So the bigger the weights, the higher the loss.

multiply is a a special case of dot. In Keras, you can specify axis using dot. Dot product is used for determining how similar two or more vectors are to each other. Note: dot product is in fact a shrink operation. Its magnitude will be smaller or equal to either of the original inputs. Demonstrated geometrically as projection:

concat does not discard any input. The concatenated vector can then be fed into a hidden layer to be rescaled elementwise. You don't find the interaction between elements. One common practice is concatenating the hidden state and output of stacked RNN and feeding that into a Dense layer to have several RNN do different tasks similar to a feedforward network.

To sum up, each Merge operation has a different use case. In Luong Attention paper, there are 3 proposed scoring mechanism. Depending on your model, you can pick and choose the one that works best for you.

$score(h_t, \bar h_s) = \begin{cases} h_t ^\top \bar h_s & dot \\ h_t ^\top \textbf{W}_a \bar h_s & general \\ v_a ^\top \textbf{W}_a [ h_t ; \bar h_s ] & concat \end{cases}$

how to determine which Merge mode (add/ average/ multiply/ dot / concat) to use?

Tags:

python

merge

keras

zshtom

1 Answers

Ricky Han

Recent Activity

Donate For Us

how to determine which Merge mode (add/ average/ multiply/ dot / concat) to use?

Tags:

python

merge

keras

zshtom

1 Answers

Ricky Han

Related questions

Recent Activity

Donate For Us