I am looking for a good (efficient and preferably simple) way to create padded tensor from sequences of variable length / shape. The best way I can imagine so far is a naive approach like this:
import torch
seq = [1,2,3] # seq of variable length
max_len = 5 # maximum length of seq
t = torch.zeros(5) # padding value
for i, e in enumerate(seq):
t[i] = e
print(t)
Output:
tensor([ 1., 2., 3., 0., 0.])
Is there a better way to do so?
I haven't found something yet, but I guess there must be something better.
I'm thinking of some function to extend the sequence tensor to the desired shape with the desired padding. Or something to create the padded tensor directly from the sequence. But of course other approaches are welcome too.
Make your variable length sequence a torch.Tensor
and use torch.nn.functional.pad
import torch
import torch.nn.functional as F
seq = torch.Tensor([1,2,3]) # seq of variable length
print(F.pad(seq, pad=(0, 2), mode='constant', value=0))
1
2
3
0
0
[torch.FloatTensor of size 5]
Signature of F.pad
is:
input
: input tensor that is your variable length sequence.pad
: m-elem tuple, where (m/2) ≤ input dimensions and m is even. In 1D case first element is how much padding to the left and second element how much padding to the right of your sequence.mode
: fill the padding with a constant or by replicating the border or reflecting the values.value
: the fill value if you choose a constant padding.As an add-on to the answer already given by @iacolippo:
I just stumbled over torch.nn.utils.rnn.pad_sequence
, as this works a bit differently as the solution by @iacolippo I post it here.
It takes a list of tensors of variable length and combines them to a matrix - padding all sequences to the longest given sequence.
Code example:
import torch
a = torch.tensor([1,2,3])
b = torch.tensor([1,2])
c = torch.tensor([1])
torch.nn.utils.rnn.pad_sequence((a,b,c), batch_first=True)
Output - padded sequences:
tensor([[ 1, 2, 3],
[ 1, 2, 0],
[ 1, 0, 0]])
Signature of torch.nn.utils.rnn.pad_sequence
:
torch.nn.utils.rnn.pad_sequence (sequences, batch_first=False, padding_value=0)
- sequences (
list[Tensor]
) – list of variable length sequences.- batch_first (
bool
, optional) – output will be inB x T x *
if True, or inT x B x *
otherwise- padding_value (
float
, optional) – value for padded elements. Default:0
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With