当前位置:   article > 正文

python实现 BLEU 评价方法n-gram 加权平均_python 计算ngram得分

python 计算ngram得分

这里用的是几何加权平均数

我们生活中常用的算术加权平均数是
w 0 x 0 + w 1 x 1 . . w 0 + w 1 . . \frac{w_{0}x_{0}+w_{1}x_{1}..}{w_{0}+w_{1}..} w0+w1..w0x0+w1x1..
而几何加权平均数是
∏ w x ( ∑ w ) \sqrt[\displaystyle(\sum{w})]{\prod{wx}}{} (w)wx
那我们在这个基础上演变一下
∏ x w ( ∑ w ) = e l n ( ∏ x w ( ∑ w ) ) = e l n ( ∏ x w ) ∑ w = e ∑ w l n x ∑ w \sqrt[(\sum{w})]{\prod{x^{w}}}{} \\=\displaystyle e^{ln(\sqrt[(\sum{w})]{\prod{x^{w}}})} \\=\displaystyle e^{\frac{ln({\prod{x^{w}}})}{\sum{w}}} \\\displaystyle =e^{\frac{\sum{wlnx}}{\sum{w}}} (w)xw =eln((w)xw )=ewln(xw)=ewwlnx
一般资料上的n-gram加权平均没有分母,还有为什么要表现成这个形式

,不写成 ∑ i = 1 N P n w n \sum_{i=1}^{N}P_{n}^{w_{n}} i=1NPnwn这种形式,应该是为了简化运算吧,这种形式要做好多次乘方和加法


from collections import Counter
from math import  exp
import nltk

def bleuMultiGram(candidate,reference,maxn,weight):
    sum=0
    if(weight):
        for i in range (1,maxn+1):
            sum+=exp(weight[i-1]*bleu(candidate,reference,i))
    else:
        for i in range(1, maxn + 1):
            sum += exp(bleu(candidate, reference, i)) #默认权重都为1
    if(len(ngram(candidate,1))>len(ngram(reference,1))):
        return sum
    else:
        return exp(1-len(ngram(candidate,1))/len(ngram(reference,1)))


def bleu(candidate, reference, n=1):
    if(len(reference)==0):
        return False;
    candidateList=ngram(candidate,n)
    referenceList=ngram(reference,n)
    cnt=NumOfIntersection(candidateList,referenceList)/ len(candidateList);
    return cnt

def ngram(str,n=1):
    '''
    返回一个字符串的ngram切分
    :param str:
    :param n:
    :return:list of str
    '''
    if len(str)<n:
        return []
    str=str.split(' ')
    string=""
    list=[]
    for i in range(len(str) - n + 1):
        for j in range(0, n):
            string += str[i + j]+" " if j<n else str[i + j]
        list.append(string)
        string=""
    return list;

def NumOfIntersection(candidate,reference):
    '''
    返回两个list的相同元素个数(不去重)
    :param candidate:
    :param reference:
    :return:int
    '''
    candidateCounter=dict(Counter(candidate))
    referenceCounter=dict(Counter(reference))
    cnt=0
    for key in candidateCounter.keys() &referenceCounter.keys():
        cnt+=min(candidateCounter[key],referenceCounter[key])
    return cnt;
print(bleu("the the the the", "the cat is standing on the ground", 1))

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61

  • 1
声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号