How to implement time-distributed dense (TDD) layer in PyTorch

Question

In some deep learning models which analyse temporal data (e.g. audio, or video), we use a "time-distributed dense" (TDD) layer. What this creates is a fully-connected (dense) layer which is applied separately to every time-step.

In Keras this can be done using the TimeDistributed wrapper, which is actually slightly more general. In PyTorch it's been an open feature request for a couple of years.

How can we implement time-distributed dense manually in PyTorch?

Dan Stowell · Accepted Answer

Specifically for time-distributed dense (and not time-distributed anything else), we can hack it by using a convolutional layer.

Look at the diagram you've shown of the TDD layer. We can re-imagine it as a convolutional layer, where the convolutional kernel has a "width" (in time) of exactly 1, and a "height" that matches the full height of the tensor. If we do this, while also making sure that our kernel is not allowed to move beyond the edge of the tensor, it should work:

self.tdd = nn.Conv2d(1, num_of_output_channels, (num_of_input_channels, 1))

You may need to do some rearrangement of tensor axes. The "input channels" for this line of code are in fact coming from the "freq" axis (the "image's y axis") of your tensor, and the "output channels" will indeed be arranged on the "channel" axis. (The "y axis" of the output will be a singleton dimension of height 1.)

Moore · Answer

As pointed out in the discussion you referred to:

Meanwhile this #1935 will make TimeDistributed/Bottle unnecessary for Linear layers.

For TDD layer, it would be applying the linear layer directly on the inputs with time slices.

In [1]: import torch

In [2]: m = torch.nn.Linear(20, 30)

In [3]: input = torch.randn(128, 5, 20)

In [4]: output = m(input)

In [5]: print(output.size())
torch.Size([128, 5, 30])

The Following is a short illustration of the computational results

In [1]: import torch                                                                                                                                                                          

In [2]: m = torch.nn.Linear(2, 3, bias=False) 
   ...:  
   ...: for name, param in m.named_parameters(): 
   ...:     print(name) 
   ...:     print(param) 
   ...:                                                                                                                                                                                       
weight
Parameter containing:
tensor([[-0.3713, -0.1113],
        [ 0.2938,  0.4709],
        [ 0.2791,  0.5355]], requires_grad=True)

In [3]: input = torch.stack([torch.ones(3, 2), 2 * torch.ones(3, 2)], dim=0) 
   ...: print(input)                                                                                                                                                                          
tensor([[[1., 1.],
         [1., 1.],
         [1., 1.]],

        [[2., 2.],
         [2., 2.],
         [2., 2.]]])

In [4]: m(input)                                                                                                                                                                              
Out[4]: 
tensor([[[-0.4826,  0.7647,  0.8145],
         [-0.4826,  0.7647,  0.8145],
         [-0.4826,  0.7647,  0.8145]],

        [[-0.9652,  1.5294,  1.6291],
         [-0.9652,  1.5294,  1.6291],
         [-0.9652,  1.5294,  1.6291]]], grad_fn=<UnsafeViewBackward>)

More details of the operation of nn.Linear can be seen from the torch.matmul. Note, you may need to add another non-linear function like torch.tanh() to get exact same layer as Dense() in Keras, where they support such non-linearity as keyword argument activation='tanh'.

For Timedistributed with e.g., CNN layers, maybe the snippet from the PyTorch forum could be useful.

How to implement time-distributed dense (TDD) layer in PyTorch

Tags:

deep-learning

pytorch

Dan Stowell

2 Answers

Dan Stowell

Moore

Recent Activity

Donate For Us

How to implement time-distributed dense (TDD) layer in PyTorch

Tags:

deep-learning

pytorch

Dan Stowell

2 Answers

Dan Stowell

Moore

Related questions

Recent Activity

Donate For Us