The Keras documentation could be improved here. After reading through this, I still do not understand what this does exactly: Keras.io.preprocessing.sequence.pad_sequences
Could someone illuminate what this function does, and ideally provide an example?
Sequence Padding The pad_sequences() function in the Keras deep learning library can be used to pad variable length sequences. The padding to be applied to the beginning or the end of the sequence, called pre- or post-sequence padding, can be specified by the “padding” argument, as follows.
Keras Preprocessing is the data preprocessing and data augmentation module of the Keras deep learning library. It provides utilities for working with image data, text data, and sequence data.
pad_sequences
is used to ensure that all sequences in a list have the same length. By default this is done by padding 0
in the beginning of each sequence until each sequence has the same length as the longest sequence.
For example
>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]]) array([[0, 1, 2, 3], [3, 4, 5, 6], [0, 0, 7, 8]], dtype=int32)
[3, 4, 5, 6]
is the longest sequence, so 0
will be padded to the other sequences so their length matches [3, 4, 5, 6]
.
If you rather want to pad to the end of the sequences you can set padding='post'
.
If you want to specify the maximum length of each sequence you can use the maxlen
argument. This will truncate all sequences longer than maxlen
.
>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]], maxlen=3) array([[1, 2, 3], [4, 5, 6], [0, 7, 8]], dtype=int32)
Now each sequence have the length 3 instead.
According to the documentation one can control the truncation with the pad_sequences. By default truncating is set to pre
, which truncates the beginning part of the sequence. If you rather want to truncate the end part of the sequence you can set it to post
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With