赞
踩
BLEU是评价机器翻译好坏的一种模型,给定机器翻译的结果和人工翻译的参考译文,该模型会自动给出翻译的得分,分数越高则表明翻译的结果越好。
为了评价翻译系统(MT)的翻译结果的好坏,我们先观察好的翻译结果之间的联系,如下例子:
Example 1:
Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.
Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.
Reference 1: It is a guide to action that ensures that the military will forever
heed Party commands.
Reference 2: It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: It is the practical guide for the army always to heed the directions of the party.
从上面例子中我们可以发现候选句子(Candidate)和参考句子(Reference)之间存在一些相同的句子片段,例如:
Candidate 1 - Reference 1: “It is a guide to action”, “ensures that the military”, “commands”
Candidate 1 - Reference 2:“which” , “always” , “of the party”
Candidate 1 - Reference 3:“always”
而
Candidate 2和参考句子之间相同的片段数量不多。
于是,我们可以找到候选句子的每个n-gram在参考句子中出现的次数,并求和,通过上面的分析可以发现,匹配的个数越多,求和的值越大,则说明候选句子更好。
Example 2:
Candidate: the the the the the the the.
Reference 1: The cat is on the mat.
Reference 2: There is a cat on the mat.
考虑上面的Example 2,如果按照n-grams的方法,候选句子(Candidate),的每个1-gram the在参考句子中出现的次数都很多,因此,求和的得分也到,按照n-grams的评价方法,Candidate是个很好的翻译结果,但事实却并非如此。
于是,我们可以考虑修改n-gram模型:
首先,计算一个单词在任意一个参考句子出现的最大次数;
然后,用每个(非重复)单词在参考句子中出现的最大次数来修剪,单词在候选句子的出现次数;
C o u n t c l i p = m i n ( C o u n t , M a x _ R e f _ C o u n t ) Count_{clip}=min(Count, Max\_Ref\_Count) Countclip=min(Count,Max_Ref_Count)
最后,将这些修剪的次数加起来,除以总的候选句子词数。
例如在Example 2中:
在Example 1中:
Candidate 1的得分为:17/18;
Candidate 2的得分为:8/14
当我们在长文本中评价时:
p n = ∑ C ∈ { C a n d i d a t e s } ∑ n − g r a m ∈ C C o u n t c l i p ( n − g r a m ) ∑ C ′ ∈ { C a n d i d a t e s } ∑ n − g r a m ′ ∈ C ′ C o u n t ( n − g r a m ′ ) p_n=\frac{\sum_{C\in\{Candidates\}}\sum_{n-gram\in C}Count_{clip}(n-gram)}{\sum_{C'\in\{Candidates\}}\sum_{n-gram'\in C'}Count(n-gram')} pn=∑C′∈{Candidates}∑n−gram′∈C′Count(n−gram′)∑C∈{Candidates}∑n−gram∈CCountclip(n−gram)
其中
C a n d i d a t e s Candidates Candidates:表示机器翻译的译文
C o u n t ( ) Count() Count():在Candidates中n-gram出现的次数
C o u n t c l i p ( ) Count_{clip}() Countclip():Candidates中n-gram在Reference中出现的次数
n-gram惩罚候选句子中的不出现在参考句子中的单词;
modified n-gram惩罚在候选句子中比参考句子中出现次数多的单词;
B
P
=
{
1
i
f
c
>
r
e
1
−
r
c
i
f
c
≤
r
BP=\left\{
c c c:Candidate语料库的长度
r r r:effective Reference的长度:Ref中和Candidate中每句匹配的句子长度之和
B L E U = B P ⋅ e x p ( ∑ n = 1 N w n l o g p n ) BLEU=BP\cdot exp(\sum_{n=1}^Nw_n logp_n) BLEU=BP⋅exp(n=1∑Nwnlogpn)
取对数后为:
log B L E U = m i n ( 1 − r c , 0 ) + ∑ n = 1 N w n log P n \log BLEU=min(1-\frac{r}{c}, 0)+\sum_{n=1}^Nw_n\log P_n logBLEU=min(1−cr,0)+n=1∑NwnlogPn
w n w_n wn:权重,一般为 1 N \frac1N N1
''' BLEU (BiLingual Evaluation Understudy) @Author: baowj @Date: 2020/9/16 @Email: bwj_678@qq.com ''' import numpy as np class BLEU(): def __init__(self, n_gram=1): super().__init__() self.n_gram = n_gram def evaluate(self, candidates, references): ''' 计算BLEU值 @param candidates [[str]]: 机器翻译的句子 @param references [[str]]: 参考的句子 @param bleu: BLEU值 ''' BP = 1 bleu = np.zeros(len(candidates)) for k, candidate in enumerate(candidates): r, c = 0, 0 count = np.zeros(self.n_gram) count_clip = np.zeros(self.n_gram) count_index = np.zeros(self.n_gram) p = np.zeros(self.n_gram) for j, candidate_sent in enumerate(candidate): # 对每个句子遍历 for i in range(self.n_gram): count_, n_grams = self.extractNgram(candidate_sent, i + 1) count[i] += count_ reference_sents = [] reference_sents = [reference[j] for reference in references] count_clip_, count_index_ = self.countClip(reference_sents, i + 1, n_grams) count_clip[i] += count_clip_ c += len(candidate_sent) r += len(reference_sents[count_index_]) p = count_clip / count rc = r / c if rc >= 1: BP = np.exp(1 - rc) else: rc = 1 p[p == 0] = 1e-100 p = np.log(p) bleu[k] = BP * np.exp(np.average(p)) return bleu def extractNgram(self, candidate, n): ''' 抽取出n-gram @param candidate: [str]: 机器翻译的句子 @param n int: n-garm值 @return count int: n-garm个数 @return n_grams set(): n-grams ''' count = 0 n_grams = set() if(len(candidate) - n + 1 > 0): count += len(candidate) - n + 1 for i in range(len(candidate) - n + 1): n_gram = ' '.join(candidate[i:i+n]) n_grams.add(n_gram) return (count, n_grams) def countClip(self, references, n, n_gram): ''' 计数references中最多有多少n_grams @param references [[str]]: 参考译文 @param n int: n-gram的值s @param n_gram set(): n-grams @return: @count: 出现的次数 @index: 最多出现次数的句子所在文本的编号 ''' max_count = 0 index = 0 for j, reference in enumerate(references): count = 0 for i in range(len(reference) - n + 1): if(' '.join(reference[i:i+n]) in n_gram): count += 1 if max_count < count: max_count = count index = j return (max_count, index) if __name__ == '__main__': bleu_ = BLEU(4) candidates = [['It is a guide to action which ensures that the military always obeys the commands of the party'], ['It is to insure the troops forever hearing the activity guidebook that party direct'], ] candidates = [[s.split() for s in candidate] for candidate in candidates] references = [['It is a guide to action that ensures that the military will forever heed Party commands'], ['It is the guiding principle which guarantees the military forces always being under the command of the Party'], ['It is the practical guide for the army always to heed the directions of the party'] ] references = [[s.split() for s in reference] for reference in references] print(bleu_.evaluate(candidates, references))
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。