PyTorch

Question

Yesterday I came across this question and for the first time noticed that the weights of the linear layer nn.Linear need to be transposed before applying matmul.

Code for applying the weights:

output = input.matmul(weight.t())

What is the reason for this?

Why are the weights not in the transposed shape just from the beginning, so they don't need to be transposed every time before applying the layer?

MBT · Accepted Answer

I found an answer here: Efficient forward pass in nn.Linear #2159

It seems like there is no real reasoning behind this. However the transpose operation doesn't seem to be slowing down the computation.

According to the issue mentioned above, during the forward pass the transpose operation is (almost) free in terms of computation. While during the backward pass leaving out the transpose operation would actually make computation less efficient with the current implementation.

The last post in that issue sums it up quite nicely:

It's historical weight layout, changing it is backward-incompatible. Unless there is some BIG benefit in terms of speed or convenience, we wont break userland.

https://github.com/pytorch/pytorch/issues/2159#issuecomment-390068272

PyTorch - shape of nn.Linear weights

Tags:

neural-network

matrix

deep-learning

matrix-multiplication

MBT

1 Answers

MBT

Recent Activity

Donate For Us