How does choosing between pre and post zero padding of sequences impact results

Tags:

I'm working on an NLP sequence labelling problem. My data consists of variable length sequences (w_1, w_2, ..., w_k) with corresponding labels (l_1, l_2, ..., l_k) (in this case the task is named entity extraction).

I intend to solve the problem using Recurrent Neural Networks. As the sequences are of variable length I need to pad them (I want batch size >1). I have the option of either pre zero padding them, or post zero padding them. I.e. either I make every sequence (0, 0, ..., w_1, w_2, ..., w_k) or (w_1, w_2, ..., w_k, 0, 0, ..., 0) such that the lenght of each sequence is the same.

How does the choice between pre- and post padding impact results?

It seems like pre padding is more common, but I can't find an explanation of why it would be better. Due to the nature of RNNs it feels like an arbitrary choice for me, since they share weights across time steps.

881

asked Sep 19 '17 11:09

langkilde

2 Answers

Commonly in RNN's, we take the final output or hidden state and use this to make a prediction (or do whatever task we are trying to do).

If we send a bunch of 0's to the RNN before taking the final output (i.e. 'post' padding as you describe), then the hidden state of the network at the final word in the sentence would likely get 'flushed out' to some extent by all the zero inputs that come after this word.

So intuitively, this might be why pre-padding is more popular/effective.

answered Oct 22 '22 19:10

nlml

This paper (https://arxiv.org/pdf/1903.07288.pdf) studied the effect of padding types on LSTM and CNN. They found that post-padding achieved substantially lower accuracy (nearly half) compared to pre-padding in LSTMs, although there wasn't a significant difference for CNNs (post-padding was only slightly worse).

A simple/intuitive explanation for RNNs is that, post-padding seems to add noise to what has been learned from the sequence through time, and there aren't more timesteps for the RNN to recover from this noise. With pre-padding, however, the RNN is better able to adjust to the added noise of zeros at the beginning as it learns from the sequence through time.

I think more thorough experiments are needed in the community for more detailed mechanistic explanations on how padding affects performance.

I always recommend using pre-padding over post-padding, even for CNNs, unless the problem specifically requires post-padding.

answered Oct 22 '22 19:10

JafetGado

Related questions
                            
                                Throttling network speed for WebSockets
                            
                                .NET: Accessing non-public members from a dynamic assembly
                            
                                Clojure application startup performance
                            
                                R: Advantages of using a Fortran subroutine with .Call and C/C++ wrapper instead of .Fortran?
                            
                                Analyzing cause of performance regression with different kernel version
                            
                                Is there a way to improve performance of linux pipes?
                            
                                Occasional slow requests on Heroku
                            
                                Ionic: slow transitions in installed android app
                            
                                How make Java 8 Nashorn fast?
                            
                                Unused function changes performances
                            
                                Android scroll up hide view and scroll down show view effect like twitter
                            
                                Python - For loop millions of rows
                            
                                Optimum file buffer read size?
                            
                                Good F# Performance Profiling Tool
                            
                                Do jQuery scripts still need $(document).ready if they are loaded after all of the page HTML?
                            
                                Can I precompile the format string in String.format? (Or do any other thing to make formatting logs faster?)
                            
                                How to improve FlatList render performance for large list with ReactNative 0.43?
                            
                                Keras inconsistent prediction time
                            
                                What is the minimum Cross AppDomain communication performance penalty?
                            
                                Performance Explanation: code runs slower after warm up

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does choosing between pre and post zero padding of sequences impact results

Tags:

performance

machine-learning

recurrent-neural-network

langkilde

People also ask

2 Answers

nlml

JafetGado

Recent Activity

Donate For Us