当前位置:   article > 正文

BLEU算法及其python实现_bleu python

bleu python

简介


BLEU是评价机器翻译好坏的一种模型,给定机器翻译的结果和人工翻译参考译文,该模型会自动给出翻译的得分,分数越高则表明翻译的结果越好。

模型建立过程

为了评价翻译系统(MT)的翻译结果的好坏,我们先观察好的翻译结果之间的联系,如下例子:

Example 1

Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.

Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.

Reference 1: It is a guide to action that ensures that the military will forever
heed Party commands.

Reference 2: It is the guiding principle which guarantees the military forces
always being under the command of the Party.

Reference 3: It is the practical guide for the army always to heed the directions of the party.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

n-grams

从上面例子中我们可以发现候选句子(Candidate)参考句子(Reference)之间存在一些相同的句子片段,例如:

Candidate 1 - Reference 1“It is a guide to action”“ensures that the military”“commands”

Candidate 1 - Reference 2“which”“always”“of the party”

Candidate 1 - Reference 3“always”

Candidate 2参考句子之间相同的片段数量不多。

于是,我们可以找到候选句子的每个n-gram参考句子中出现的次数,并求和,通过上面的分析可以发现,匹配的个数越多,求和的值越大,则说明候选句子更好。


Modified n-grams

Example 2

Candidate: the the the the the the the.

Reference 1: The cat is on the mat.

Reference 2: There is a cat on the mat.
  • 1
  • 2
  • 3
  • 4
  • 5

考虑上面的Example 2,如果按照n-grams的方法,候选句子(Candidate),的每个1-gram the参考句子中出现的次数都很多,因此,求和的得分也到,按照n-grams的评价方法,Candidate是个很好的翻译结果,但事实却并非如此

于是,我们可以考虑修改n-gram模型:

  1. 首先,计算一个单词在任意一个参考句子出现的最大次数;

  2. 然后,用每个(非重复)单词参考句子中出现的最大次数来修剪,单词在候选句子的出现次数;

    C o u n t c l i p = m i n ( C o u n t , M a x _ R e f _ C o u n t ) Count_{clip}=min(Count, Max\_Ref\_Count) Countclip=min(Count,Max_Ref_Count)

  3. 最后,将这些修剪的次数加起来,除以总的候选句子词数。

例如在Example 2中:

  1. theRef 1出现的次数为:2,在Ref 2出现的次数为:1;
  2. the修剪后的次数为:2;
  3. the的最终值为2/7

Example 1中:

Candidate 1的得分为:17/18

Candidate 2的得分为:8/14


Modified n-grams on blocks of text

当我们在长文本中评价时:

  1. 首先,逐句地计算n-gram匹配个数;
  2. 然后,将所有候选句子 C o u n t c l i p Count_{clip} Countclip加在一起,除以测试语料库中的候选句子 n-gram总数,得到整个测试语料库的分数 p n p_n pn

p n = ∑ C ∈ { C a n d i d a t e s } ∑ n − g r a m ∈ C C o u n t c l i p ( n − g r a m ) ∑ C ′ ∈ { C a n d i d a t e s } ∑ n − g r a m ′ ∈ C ′ C o u n t ( n − g r a m ′ ) p_n=\frac{\sum_{C\in\{Candidates\}}\sum_{n-gram\in C}Count_{clip}(n-gram)}{\sum_{C'\in\{Candidates\}}\sum_{n-gram'\in C'}Count(n-gram')} pn=C{Candidates}ngramCCount(ngram)C{Candidates}ngramCCountclip(ngram)

其中

C a n d i d a t e s Candidates Candidates:表示机器翻译的译文

C o u n t ( ) Count() Count():在Candidatesn-gram出现的次数

C o u n t c l i p ( ) Count_{clip}() Countclip()Candidatesn-gramReference中出现的次数


Sentence length

n-gram惩罚候选句子中的不出现在参考句子中的单词;

modified n-gram惩罚在候选句子中比参考句子中出现次数多的单词;


BLEU

B P = { 1 i f c > r e 1 − r c i f c ≤ r BP=\left\{

1ifc>re1rcifcr
\right. BP={1ifc>re1crifcr

c c cCandidate语料库的长度

r r reffective Reference的长度:Ref中和Candidate中每句匹配的句子长度之和


B L E U = B P ⋅ e x p ( ∑ n = 1 N w n l o g p n ) BLEU=BP\cdot exp(\sum_{n=1}^Nw_n logp_n) BLEU=BPexp(n=1Nwnlogpn)

取对数后为:

log ⁡ B L E U = m i n ( 1 − r c , 0 ) + ∑ n = 1 N w n log ⁡ P n \log BLEU=min(1-\frac{r}{c}, 0)+\sum_{n=1}^Nw_n\log P_n logBLEU=min(1cr,0)+n=1NwnlogPn

w n w_n wn:权重,一般为 1 N \frac1N N1


Code

''' BLEU (BiLingual Evaluation Understudy)
@Author: baowj
@Date: 2020/9/16
@Email: bwj_678@qq.com
'''
import numpy as np

class BLEU():
    def __init__(self, n_gram=1):
        super().__init__()
        self.n_gram = n_gram

    def evaluate(self, candidates, references):
        ''' 计算BLEU值
        @param candidates [[str]]: 机器翻译的句子
        @param references [[str]]: 参考的句子
        @param bleu: BLEU值
        '''

        BP = 1
        bleu = np.zeros(len(candidates))
        for k, candidate in enumerate(candidates):
            r, c = 0, 0
            count = np.zeros(self.n_gram)
            count_clip = np.zeros(self.n_gram)
            count_index = np.zeros(self.n_gram)
            p = np.zeros(self.n_gram)
            for j, candidate_sent in enumerate(candidate):
                # 对每个句子遍历
                for i in range(self.n_gram):
                    count_, n_grams = self.extractNgram(candidate_sent, i + 1)
                    count[i] += count_
                    reference_sents = []
                    reference_sents = [reference[j] for reference in references]
                    count_clip_, count_index_ = self.countClip(reference_sents, i + 1, n_grams)
                    count_clip[i] += count_clip_
                    c += len(candidate_sent)
                    r += len(reference_sents[count_index_])
                p = count_clip / count
            rc = r / c
            if rc >= 1:
                BP = np.exp(1 - rc)
            else:
                rc = 1
            p[p == 0] = 1e-100
            p = np.log(p)
            bleu[k] = BP * np.exp(np.average(p))
        return bleu
            

    def extractNgram(self, candidate, n):
        ''' 抽取出n-gram
        @param candidate: [str]: 机器翻译的句子
        @param n int: n-garm值
        @return count int: n-garm个数
        @return n_grams set(): n-grams 
        '''
        count = 0
        n_grams = set()
        if(len(candidate) - n + 1 > 0):
            count += len(candidate) - n + 1
        for i in range(len(candidate) - n + 1):
            n_gram = ' '.join(candidate[i:i+n])
            n_grams.add(n_gram)
        return (count, n_grams)
    
    def countClip(self, references, n, n_gram):
        ''' 计数references中最多有多少n_grams
        @param references [[str]]: 参考译文
        @param n int: n-gram的值s
        @param n_gram set(): n-grams

        @return:
        @count: 出现的次数
        @index: 最多出现次数的句子所在文本的编号
        '''
        max_count = 0
        index = 0
        for j, reference in enumerate(references):
            count = 0
            for i in range(len(reference) - n + 1):
                if(' '.join(reference[i:i+n]) in n_gram):
                    count += 1
            if max_count < count:
                max_count = count
                index = j
        return (max_count, index)


if __name__ == '__main__':
    bleu_ = BLEU(4)
    candidates = [['It is a guide to action which ensures that the military always obeys the commands of the party'],
                 ['It is to insure the troops forever hearing the activity guidebook that party direct'],
    ]
    candidates = [[s.split() for s in candidate] for candidate in candidates]
    references = [['It is a guide to action that ensures that the military will forever heed Party commands'],
                  ['It is the guiding principle which guarantees the military forces always being under the command of the Party'],
                  ['It is the practical guide for the army always to heed the directions of the party']
    ]
    references = [[s.split() for s in reference] for reference in references]
    print(bleu_.evaluate(candidates, references))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
Reference:

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/360916
推荐阅读
相关标签
  

闽ICP备14008679号