If we use beam search in seq2seq model it will give more proper results. There are several tensorflow implementations. But with the softmax function in each cell you can't use beam search in the training process. So is there any other modified optimization function when using beam search?
In computer science, beam search is a heuristic search algorithm that explores a graph by expanding the most promising node in a limited set. Beam search is an optimization of best-first search that reduces its memory requirements.
The beam search strategy generates the translation word by word from left-to-right while keeping a fixed number (beam) of active candidates at each time step. By increasing the beam size, the translation perfor- mance can increase at the expense of significantly reducing the decoder speed.
Beam Search makes two improvements over Greedy Search. With Greedy Search, we took just the single best word at each position. In contrast, Beam Search expands this and takes the best 'N' words. With Greedy Search, we considered each position in isolation.
Beam search is the go-to method for decod- ing auto-regressive machine translation models. While it yields consistent improvements in terms of BLEU, it is only concerned with finding out- puts with high model likelihood, and is thus agnos- tic to whatever end metric or score practitioners care about.
Beam search is a heuristic search technique that always expands the W number of the best nodes at each level. It progresses level by level and moves downwards only from the best W nodes at each level. Beam Search uses breadth-first search to build its search tree. Beam Search constructs its search tree using breadth-first search.
Ask the right Questions to succeed in your Agile Projects and Products Business Event Analysis & Modelling (BEAM) is an agile requirement gathering for Data Warehouses, with the goal of aligning requirement analysis with business processes rather than just reports.
The hyperparameter ’N’ is known as the Beam width. Intuitively it makes sense that this gives us better results over Greedy Search. Because, what we are really interested in is the best complete sentence, and we might miss that if we picked only the best individual word in each position.
Business Event Analysis & Modelling (BEAM) is an agile requirement gathering for Data Warehouses, with the goal of aligning requirement analysis with business processes rather than just reports. It has its roots in Agile Data Warehouse Design by Lawrence Corr and Jim Stagnitto [1].
As Oliver mentioned in order to use beam search in the training procedure we have to use beam search optimization which is clearly mentioned in the paper Sequence-to-Sequence Learning as Beam-Search Optimization.
We can't use beam search in the training procedure with the current loss function. Because current loss function is a log loss which is taken on each time step. It's a greedy way. It also clearly mentioned in the this paper Sequence to Sequence Learning with Neural Networks. In the section 3.2 it has mentioned the above case neatly.
"where S is the training set. Once training is complete, we produce tr anslations by finding the most likely translation according to the LSTM:"
So the original seq2seq architecture use beam search only in the testing time. If we want to use this beam search in the training time we have to use another loss and optimization method as in the paper.
Sequence-to-Sequence Learning as Beam-Search Optimization is a paper that describes the steps neccesary to use beam search in the training process. https://arxiv.org/abs/1606.02960
The following issue contains a script that can perform the beam search however it does not contain any of the training logic https://github.com/tensorflow/tensorflow/issues/654
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With