Variation in BLEU Score

Tags:

I have some question on BLUE Score calculation for machine translation. I realized they may have a different metrics for BLEU. I found the code reports five value for BLEU, namely BLEU-1, BLEU-2, BLEU-3, BLEU-4 and finally BLEU, which seems to be an exponential average of the previous four BLEUs. Still it is not clear to me what the difference between those is. Do you have any ideas? Thanks

P.s. At first I thought that this question is more of a theoretical content and posted it on meta stackexange. A moderator has closed and commented it as a stackoverflow type question . So please don't punish me again. =)

429

asked Jun 02 '17 08:06

Jürgen K.

2 Answers

source: http://www.statmt.org/book/slides/08-evaluation.pdf

I haven't heard of BLEU-1 and BLEU-2 but I guess it means 1-gram, 2-gram, 3-gram and 4-gram in the formula of BLEU score, I mean in the formula precision[i] = BLEU-i in your question:
enter image description here

answered Oct 21 '22 11:10

Iman Mirzadeh

Actually, BLEU-n doesn't use the n-gram scores only. It computes the 1-gram through n-gram scores and gives them equal weight to compute a final score. See the "Cumulative N-Gram Scores" section at this link for more info.

answered Oct 21 '22 11:10

Tara Eicher

Related questions
                            
                                Sklearn Chi2 For Feature Selection
                            
                                RuntimeError: size mismatch m1: [a x b], m2: [c x d]
                            
                                Classification metrics can't handle a mix of binary and continuous targets [duplicate]
                            
                                How can I use computer vision to find a shape in an image?
                            
                                Architecture & Essential Components of StumbleUpon's Recommendation Engine
                            
                                Advantages of SVM over decion trees and AdaBoost algorithm
                            
                                What FFT descriptors should be used as feature to implement classification or clustering algorithm?
                            
                                roc curve with sklearn [python]
                            
                                SKLearn how to get decision probabilities for LinearSVC classifier
                            
                                What does the capital letter 'J' mean in cost function J(θ)?
                            
                                ROC curve for binary classification in python
                            
                                Combining heuristics when ranking social network news feed items
                            
                                Why is Keras LSTM on CPU three times faster than GPU?
                            
                                What is the Search/Prediction Time Complexity of Logistic Regression?
                            
                                Resume Training tf.keras Tensorboard
                            
                                Large Scale Image Classifier
                            
                                Are decision trees (e.g. C4.5) considered nonparametric learning?
                            
                                Why won't Perceptron Learning Algorithm converge?
                            
                                Calculating AUC when using Vowpal Wabbit
                            
                                Should I normalize my features before throwing them into RNN?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Variation in BLEU Score

Tags:

machine-learning

metrics

evaluation

translation

bleu

Jürgen K.

People also ask

2 Answers

Iman Mirzadeh

Tara Eicher

Recent Activity

Donate For Us