How to use the PyTorch Transformer with multi-dimensional sequence-to-seqence?

Tags:

I'm trying to go seq2seq with a Transformer model. My input and output are the same shape (torch.Size([499, 128]) where 499 is the sequence length and 128 is the number of features.

My input looks like: enter image description here

My output looks like: enter image description here

My training loop is:

    for batch in tqdm(dataset):
        optimizer.zero_grad()
        x, y = batch

        x = x.to(DEVICE)
        y = y.to(DEVICE)

        pred = model(x, torch.zeros(x.size()).to(DEVICE))

        loss = loss_fn(pred, y)
        loss.backward()
        optimizer.step()

My model is:

import math
from typing import final
import torch
import torch.nn as nn

class Reconstructor(nn.Module):
    def __init__(self, input_dim, output_dim, dim_embedding, num_layers=4, nhead=8, dim_feedforward=2048, dropout=0.5):
        super(Reconstructor, self).__init__()

        self.model_type = 'Transformer'
        self.src_mask = None
        self.pos_encoder = PositionalEncoding(d_model=dim_embedding, dropout=dropout)
        self.transformer = nn.Transformer(d_model=dim_embedding, nhead=nhead, dim_feedforward=dim_feedforward, num_encoder_layers=num_layers, num_decoder_layers=num_layers)
        self.decoder = nn.Linear(dim_embedding, output_dim)
        self.decoder_act_fn = nn.PReLU()

        self.init_weights()

    def init_weights(self):
        initrange = 0.1
        nn.init.zeros_(self.decoder.weight)
        nn.init.uniform_(self.decoder.weight, -initrange, initrange)

    def forward(self, src, tgt):

        pe_src = self.pos_encoder(src.permute(1, 0, 2))  # (seq, batch, features)
        transformer_output = self.transformer_encoder(pe_src)
        decoder_output = self.decoder(transformer_output.permute(1, 0, 2)).squeeze(2)
        decoder_output = self.decoder_act_fn(decoder_output)
        return decoder_output

My output has a shape of torch.Size([32, 499, 128]) where 32 is batch, 499 is my sequence length and 128 is the number of features. But the output has the same values:

tensor([[[0.0014, 0.0016, 0.0017,  ..., 0.0018, 0.0021, 0.0017],
         [0.0014, 0.0016, 0.0017,  ..., 0.0018, 0.0021, 0.0017],
         [0.0014, 0.0016, 0.0017,  ..., 0.0018, 0.0021, 0.0017],
         ...,
         [0.0014, 0.0016, 0.0017,  ..., 0.0018, 0.0021, 0.0017],
         [0.0014, 0.0016, 0.0017,  ..., 0.0018, 0.0021, 0.0017],
         [0.0014, 0.0016, 0.0017,  ..., 0.0018, 0.0021, 0.0017]]],
       grad_fn=<PreluBackward>)

What am I doing wrong? Thank you so much for any help.

628

asked Nov 17 '20 14:11

Shamoon

1 Answers

There are several points to be checked. As you have same output to the different inputs, I suspect that some layer zeros out all it's inputs. So check the outputs of the PositionalEncoding and also Encoder block of the Transformer, to make sure they are not constant. But before that, make sure your inputs differ (try to inject noise, for example).

Additionally, from what I see in the pictures, your input and output are speech signals and was sampled at 22.05kHz (I guess), so it should have ~10k features, but you claim that you have only 128. This is another place to check. Now, the number 499 represent some time slice. Make sure your slices are in reasonable range (20-50 msec, usually 30). If it is the case, then 30ms by 500 is 15 seconds, which is much more you have in your example. And finally you are masking off a third of a second of speech in your input, which is too much I believe.

I think it would be useful to examine Wav2vec and Wav2vec 2.0 papers, which tackle the problem of self supervised training in speech recognition domain using Transformer Encoder with great success.

196

answered Oct 19 '22 01:10

igrinis

Related questions
                            
                                Send message from Viber bot to subscribed user
                            
                                __get__ of descriptor __class__ of object class doesn't return as expected
                            
                                It's ok to mix Conda install and Pip install?
                            
                                Python sets versus arrays
                            
                                Hot to fix Tensorflow model not running in Eager mode with .fit()?
                            
                                TF 2.0: Where can I find the upgrade of tf.contrib.training?
                            
                                how to fix "cannot import name 'imresize' error while this function importing from scipy.misc?
                            
                                Tensorflow: create tf.NodeDef() and set attributes
                            
                                Caveats while checking dtype in pandas DataFrame
                            
                                Not able to get real time error in Visual code during python development
                            
                                Why I am getting DatasetV1Adapter return type instead of TensorSliceDataset for tf.data.Dataset.from_tensor_slices(X)
                            
                                Unable to read keystore file from pyspark
                            
                                Correct way to use custom weight maps in unet architecture
                            
                                How can I fix this pytorch error on Windows? (ModuleNotFoundError: No module named 'torch')
                            
                                How to setup a grammar that can handle ambiguity
                            
                                Retrieving text body of answers and comments using Stackexchange API
                            
                                Property Setter for Subclass of Pandas DataFrame
                            
                                Unable to clear pexpect buffer in python3.X
                            
                                Pass function and arguments from node to python, using child_process
                            
                                Why is it that `input_shape` does not include the batch dimension when passed as an argument to the `Dense` layer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use the PyTorch Transformer with multi-dimensional sequence-to-seqence?

Tags:

python

machine-learning

pytorch

transformer

sequence-to-sequence

transformer-model

Shamoon

People also ask

1 Answers

igrinis

Recent Activity

Donate For Us