赞
踩
给定一个生成序列“The cat sat on the mat”和两个参考序列“The cat is on the mat”“The bird sat on the bush”分别计算BLEU-N和ROUGE-N得分(N=1或N =2时).
【深度学习】序列生成模型(五):评价方法计算实例:计算BLEU-N得分
设 x \mathbf{x} x 为从模型分布 p θ p_{\theta} pθ 中生成的一个候选序列, s ( 1 ) , ⋯ , s ( K ) \mathbf{s^{(1)}}, ⋯ , \mathbf{s^{(K)}} s(1),⋯,s(K) 为从真实数据分布中采样得到的一组参考序列, W \mathcal{W} W 为从参考序列中提取N元组合的集合,ROUGE-N算法的定义为:
ROUGE-N ( x ) = ∑ k = 1 K ∑ w ∈ W min ( c w ( x ) , c w ( s ( k ) ) ) ∑ k = 1 K ∑ w ∈ W c w ( s ( k ) ) \text{ROUGE-N}(\mathbf{x}) = \frac{\sum_{k=1}^{K} \sum_{w \in \mathcal{W}} \min(c_w(\mathbf{x}), c_w(\mathbf{s}^{(k)}))}{\sum_{k=1}^{K} \sum_{w \in \mathcal{W}} c_w(\mathbf{s}^{(k))}} ROUGE-N(x)=∑k=1K∑w∈Wcw(s(k))∑k=1K∑w∈Wmin(cw(x),cw(s(k)))
其中 c w ( x ) c_w(\mathbf{x}) cw(x) 是N元组合 w w w 在生成序列 x \mathbf{x} x 中出现的次数, c w ( s ( k ) ) ) c_w(\mathbf{s}^{(k))}) cw(s(k))) 是N元组合 w w w 在参考序列 s ( k ) \mathbf{s}^{(k)} s(k) 中出现的次数。
w w w | c w ( x ) c_w(\mathbf{x}) cw(x) | c w ( s ( 1 ) ) c_w(\mathbf{s^{(1)}}) cw(s(1)) | c w ( s ( 2 ) ) c_w(\mathbf{s^{(2)}}) cw(s(2)) | min ( c w ( x ) , c w ( s ( 1 ) ) \min(c_w(\mathbf{x}), c_w(\mathbf{s}^{(1)}) min(cw(x),cw(s(1)) | min ( c w ( x ) , c w ( s ( 2 ) ) \min(c_w(\mathbf{x}), c_w(\mathbf{s}^{(2)}) min(cw(x),cw(s(2)) |
---|---|---|---|---|---|
the | 2 | 2 | 2 | 2 | 2 |
cat | 1 | 1 | 0 | 1 | 0 |
is | 0 | 1 | 0 | 0 | 0 |
on | 1 | 1 | 1 | 1 | 1 |
mat | 1 | 1 | 0 | 1 | 0 |
bird | 0 | 0 | 1 | 0 | 0 |
sat | 1 | 0 | 1 | 0 | 1 |
bush | 0 | 0 | 1 | 0 | 0 |
w w w | c w ( x ) c_w(\mathbf{x}) cw(x) | c w ( s ( 1 ) ) c_w(\mathbf{s^{(1)}}) cw(s(1)) | c w ( s ( 2 ) ) c_w(\mathbf{s^{(2)}}) cw(s(2)) | min ( c w ( x ) , c w ( s ( 1 ) ) \min(c_w(\mathbf{x}), c_w(\mathbf{s}^{(1)}) min(cw(x),cw(s(1)) | min ( c w ( x ) , c w ( s ( 2 ) ) \min(c_w(\mathbf{x}), c_w(\mathbf{s}^{(2)}) min(cw(x),cw(s(2)) |
---|---|---|---|---|---|
the cat | 1 | 1 | 0 | 1 | 0 |
cat is | 0 | 1 | 0 | 0 | 0 |
is on | 0 | 1 | 0 | 0 | 0 |
on the | 1 | 1 | 1 | 1 | 1 |
the mat | 1 | 1 | 0 | 0 | 0 |
the bird | 0 | 0 | 1 | 0 | 0 |
bird sat | 0 | 0 | 1 | 0 | 0 |
sat on | 1 | 0 | 1 | 1 | 1 |
the bush | 0 | 0 | 1 | 0 | 0 |
main_string = 'the cat sat on the mat' string1 = 'the cat is on the mat' string2 = 'the bird sat on the bush' words = list(set(string1.split(' ')+string2.split(' '))) # 去除重复元素 total_occurrences, matching_occurrences = 0, 0 for word in words: matching_occurrences += min(main_string.count(word), string1.count(word)) + min(main_string.count(word), string2.count(word)) total_occurrences += string1.count(word) + string2.count(word) print(matching_occurrences / total_occurrences) bigrams = [] split1 = string1.split(' ') for i in range(len(split1) - 1): bigrams.append(split1[i] + ' ' + split1[i + 1]) split2 = string2.split(' ') for i in range(len(split2) - 1): bigrams.append(split2[i] + ' ' + split2[i + 1]) bigrams = list(set(bigrams)) # 去除重复元素 total_occurrences, matching_occurrences = 0, 0 for bigram in bigrams: matching_occurrences += min(main_string.count(bigram), string1.count(bigram)) + min(main_string.count(bigram), string2.count(bigram)) total_occurrences += string1.count(bigram) + string2.count(bigram) print(matching_occurrences / total_occurrences)
输出:
0.75
0.5
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。