Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate BLEU score in Python

Tags:

python

nltk

There is a test sentence and a reference sentence. How can I write a Python script that measures similarity between these two sentences in the form of BLEU metric used in automatic machine translation evaluation?

like image 433
Alapan Kuila Avatar asked Sep 04 '15 10:09

Alapan Kuila


1 Answers

The BLEU score consists of two parts, modified precision and brevity penalty. Details can be seen in the paper. You can use the nltk.align.bleu_score module inside the NLTK. One code example can be seen as below:

import nltk

hypothesis = ['It', 'is', 'a', 'cat', 'at', 'room']
reference = ['It', 'is', 'a', 'cat', 'inside', 'the', 'room']
#there may be several references
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis)
print(BLEUscore)

Note that the default BLEU score uses n=4 which includes unigrams to 4 grams. If your sentence is smaller than 4, you need to reset the N value, otherwise ZeroDivisionError: Fraction(0, 0) error will be returned. So, you should reset the weight like this:

import nltk

hypothesis = ["open", "the", "file"]
reference = ["open", "file"]
#the maximum is bigram, so assign the weight into 2 half.
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = (0.5, 0.5))
print(BLEUscore)
like image 138
ccy Avatar answered Oct 17 '22 04:10

ccy