Do we need to use beam search in training process?

Tags:

If we use beam search in seq2seq model it will give more proper results. There are several tensorflow implementations. But with the softmax function in each cell you can't use beam search in the training process. So is there any other modified optimization function when using beam search?

370

asked May 28 '17 14:05

Shamane Siriwardhana

2 Answers

As Oliver mentioned in order to use beam search in the training procedure we have to use beam search optimization which is clearly mentioned in the paper Sequence-to-Sequence Learning as Beam-Search Optimization.

We can't use beam search in the training procedure with the current loss function. Because current loss function is a log loss which is taken on each time step. It's a greedy way. It also clearly mentioned in the this paper Sequence to Sequence Learning with Neural Networks. In the section 3.2 it has mentioned the above case neatly.

enter image description here

"where S is the training set. Once training is complete, we produce tr anslations by finding the most likely translation according to the LSTM:"

So the original seq2seq architecture use beam search only in the testing time. If we want to use this beam search in the training time we have to use another loss and optimization method as in the paper.

173

answered Nov 10 '22 04:11

Shamane Siriwardhana

Sequence-to-Sequence Learning as Beam-Search Optimization is a paper that describes the steps neccesary to use beam search in the training process. https://arxiv.org/abs/1606.02960

The following issue contains a script that can perform the beam search however it does not contain any of the training logic https://github.com/tensorflow/tensorflow/issues/654

answered Nov 10 '22 06:11

Oliver

Related questions
                            
                                Tensorflow not recognising cudart64_101.dll
                            
                                Epoch 1/2 103/Unknown - 8s 80ms/step - loss: 0.0175 (model.fit() keeps running forever even after crossing the total number of training images)
                            
                                How to load a keras model saved as .pb
                            
                                Define a feed_dict in c++ for Tensorflow models
                            
                                Tensorflow nn.conv3d() and max_pool3d
                            
                                Difference between Tensorflow convolution and numpy convolution
                            
                                What is colocate_with used for in tensorflow?
                            
                                Tensorflow value error: Variable already exists, disallowed
                            
                                Channels first with Keras?
                            
                                Tensorflow error : unsupported callable
                            
                                Problems with using tensorflow lite C++ API in Android Studio Project
                            
                                How to ensure tensorflow is using the GPU
                            
                                tf.keras.models.save_model and optimizer warning
                            
                                unable to build model as backend.squeeze has no layer
                            
                                tensorflow error This file requires compiler and library support for the ISO C++ 2011 standard
                            
                                How convert output tensor to one-hot tensor?
                            
                                How to train an lstm for speech recognition
                            
                                Tensorflow: open a PIL.Image?
                            
                                Why is there no mention of contrib.layers.linear in the Tensorflow documentation?
                            
                                Tensorflow GPU - Spyder

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do we need to use beam search in training process?

Tags:

tensorflow

deep-learning

Shamane Siriwardhana

People also ask

2 Answers

Shamane Siriwardhana

Oliver

Recent Activity

Donate For Us