In some deep learning models which analyse temporal data (e.g. audio, or video), we use a "time-distributed dense" (TDD) layer. What this creates is a fully-connected (dense) layer which is applied separately to every time-step.

In Keras this can be done using the TimeDistributed wrapper, which is actually slightly more general. In PyTorch it's been an open feature request for a couple of years.
How can we implement time-distributed dense manually in PyTorch?
Specifically for time-distributed dense (and not time-distributed anything else), we can hack it by using a convolutional layer.
Look at the diagram you've shown of the TDD layer. We can re-imagine it as a convolutional layer, where the convolutional kernel has a "width" (in time) of exactly 1, and a "height" that matches the full height of the tensor. If we do this, while also making sure that our kernel is not allowed to move beyond the edge of the tensor, it should work:
self.tdd = nn.Conv2d(1, num_of_output_channels, (num_of_input_channels, 1))
You may need to do some rearrangement of tensor axes. The "input channels" for this line of code are in fact coming from the "freq" axis (the "image's y axis") of your tensor, and the "output channels" will indeed be arranged on the "channel" axis. (The "y axis" of the output will be a singleton dimension of height 1.)
As pointed out in the discussion you referred to:
Meanwhile this #1935 will make TimeDistributed/Bottle unnecessary for Linear layers.
For TDD layer, it would be applying the linear layer directly on the inputs with time slices.
In [1]: import torch
In [2]: m = torch.nn.Linear(20, 30)
In [3]: input = torch.randn(128, 5, 20)
In [4]: output = m(input)
In [5]: print(output.size())
torch.Size([128, 5, 30])
The Following is a short illustration of the computational results
In [1]: import torch
In [2]: m = torch.nn.Linear(2, 3, bias=False)
...:
...: for name, param in m.named_parameters():
...: print(name)
...: print(param)
...:
weight
Parameter containing:
tensor([[-0.3713, -0.1113],
[ 0.2938, 0.4709],
[ 0.2791, 0.5355]], requires_grad=True)
In [3]: input = torch.stack([torch.ones(3, 2), 2 * torch.ones(3, 2)], dim=0)
...: print(input)
tensor([[[1., 1.],
[1., 1.],
[1., 1.]],
[[2., 2.],
[2., 2.],
[2., 2.]]])
In [4]: m(input)
Out[4]:
tensor([[[-0.4826, 0.7647, 0.8145],
[-0.4826, 0.7647, 0.8145],
[-0.4826, 0.7647, 0.8145]],
[[-0.9652, 1.5294, 1.6291],
[-0.9652, 1.5294, 1.6291],
[-0.9652, 1.5294, 1.6291]]], grad_fn=<UnsafeViewBackward>)
More details of the operation of nn.Linear can be seen from the torch.matmul. Note, you may need to add another non-linear function like torch.tanh() to get exact same layer as Dense() in Keras, where they support such non-linearity as keyword argument activation='tanh'.
For Timedistributed with e.g., CNN layers, maybe the snippet from the PyTorch forum could be useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With