Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does Keras.io.preprocessing.sequence.pad_sequences do?

The Keras documentation could be improved here. After reading through this, I still do not understand what this does exactly: Keras.io.preprocessing.sequence.pad_sequences

Could someone illuminate what this function does, and ideally provide an example?

like image 330
Koffiman Avatar asked Mar 22 '17 05:03

Koffiman


People also ask

Which of the following functions is used to pad sequence in Keras?

Sequence Padding The pad_sequences() function in the Keras deep learning library can be used to pad variable length sequences. The padding to be applied to the beginning or the end of the sequence, called pre- or post-sequence padding, can be specified by the “padding” argument, as follows.

What Keras preprocessing?

Keras Preprocessing is the data preprocessing and data augmentation module of the Keras deep learning library. It provides utilities for working with image data, text data, and sequence data.


1 Answers

pad_sequences is used to ensure that all sequences in a list have the same length. By default this is done by padding 0 in the beginning of each sequence until each sequence has the same length as the longest sequence.

For example

>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]]) array([[0, 1, 2, 3],        [3, 4, 5, 6],        [0, 0, 7, 8]], dtype=int32) 

[3, 4, 5, 6] is the longest sequence, so 0 will be padded to the other sequences so their length matches [3, 4, 5, 6].

If you rather want to pad to the end of the sequences you can set padding='post'.

If you want to specify the maximum length of each sequence you can use the maxlen argument. This will truncate all sequences longer than maxlen.

>>> pad_sequences([[1, 2, 3], [3, 4, 5, 6], [7, 8]], maxlen=3) array([[1, 2, 3],        [4, 5, 6],        [0, 7, 8]], dtype=int32) 

Now each sequence have the length 3 instead.

According to the documentation one can control the truncation with the pad_sequences. By default truncating is set to pre, which truncates the beginning part of the sequence. If you rather want to truncate the end part of the sequence you can set it to post.

like image 54
oscfri Avatar answered Sep 20 '22 07:09

oscfri