Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyTorch - create padded tensor from sequences of variable length

Tags:

python

pytorch

I am looking for a good (efficient and preferably simple) way to create padded tensor from sequences of variable length / shape. The best way I can imagine so far is a naive approach like this:

import torch
seq = [1,2,3]      # seq of variable length
max_len = 5        # maximum length of seq
t = torch.zeros(5) # padding value
for i, e in enumerate(seq):
    t[i] = e
print(t)

Output:

tensor([ 1.,  2.,  3.,  0.,  0.])

Is there a better way to do so?

I haven't found something yet, but I guess there must be something better.

I'm thinking of some function to extend the sequence tensor to the desired shape with the desired padding. Or something to create the padded tensor directly from the sequence. But of course other approaches are welcome too.

like image 285
MBT Avatar asked Nov 27 '22 13:11

MBT


2 Answers

Make your variable length sequence a torch.Tensor and use torch.nn.functional.pad

import torch
import torch.nn.functional as F

seq = torch.Tensor([1,2,3])      # seq of variable length
print(F.pad(seq, pad=(0, 2), mode='constant', value=0))
 1
 2
 3
 0
 0
[torch.FloatTensor of size 5]

Signature of F.pad is:

  • input: input tensor that is your variable length sequence.
  • pad: m-elem tuple, where (m/2) ≤ input dimensions and m is even. In 1D case first element is how much padding to the left and second element how much padding to the right of your sequence.
  • mode: fill the padding with a constant or by replicating the border or reflecting the values.
  • value: the fill value if you choose a constant padding.
like image 154
iacolippo Avatar answered Dec 05 '22 07:12

iacolippo


As an add-on to the answer already given by @iacolippo:

I just stumbled over torch.nn.utils.rnn.pad_sequence, as this works a bit differently as the solution by @iacolippo I post it here.

It takes a list of tensors of variable length and combines them to a matrix - padding all sequences to the longest given sequence.

Code example:

import torch

a = torch.tensor([1,2,3])
b = torch.tensor([1,2])
c = torch.tensor([1])
torch.nn.utils.rnn.pad_sequence((a,b,c), batch_first=True)

Output - padded sequences:

tensor([[ 1,  2,  3],
        [ 1,  2,  0],
        [ 1,  0,  0]])

Signature of torch.nn.utils.rnn.pad_sequence:

torch.nn.utils.rnn.pad_sequence (sequences, batch_first=False, padding_value=0)

  • sequences (list[Tensor]) – list of variable length sequences.
  • batch_first (bool, optional) – output will be in B x T x * if True, or in T x B x * otherwise
  • padding_value (float, optional) – value for padded elements. Default: 0.
like image 45
MBT Avatar answered Dec 05 '22 06:12

MBT