当前位置:   article > 正文

自然语言处理 无监督句子多样性评价指标_bertscore计算方法

bertscore计算方法

转载自:我的个人博客

在项目推进过程中,产生了对生成句子多样性进行评价、筛选的需求。遂调研了部分现有的无监督句子多样性的评价指标,以备参考使用。

BERTScore

paper: BERTSCORE: EVALUATING TEXT GENERATION WITH BERT

在这里插入图片描述

每个词找另一个句子中和它内积最大的词

R B E R T = 1 ∣ x ∣ Σ x i ∈ x m a x x ^ ∈ x ^ x i T x j ^ , P B E R T = 1 ∥ x ^ ∥ Σ x i ^ ∈ x ^ m a x x ^ ∈ x ^ x i T x j ^ , F B E R T = 2 P B E R T ⋅ R B E R T P B E R T + R B E R T R_{BERT} = \frac{1}{|x|} \underset{x_i \in x}{\Sigma} \underset{\hat{x}\in \hat{x}} {max} x_i^{T} \hat{x_j}, \quad P_{BERT} = \frac{1}{\|\hat{x}\|} \underset{\hat{x_i} \in \hat{x}}{\Sigma} \underset{\hat{x} \in \hat{x}}{max} x_i^{T} \hat{x_j}, \quad F_{BERT} = 2\frac{P_{BERT}\cdot R_{BERT}}{P_{BERT} + R_{BERT}} RBERT=x1xixΣx^x^maxxiTxj^,PBERT=x^1xi^x^Σx^x^maxxiTxj^,FBERT=2PBERT+RBERTPBERTRBERT

Importance Weighting

based on inverse document frequency

i d f ( w ) = − log ⁡ 1 M Σ i = 1 M I [ w ∈ x ( i ) ] idf(w) = -\log \frac{1}{M} \Sigma_{i=1}^{M} I [w \in x^{(i)}] idf(w)=logM1Σi=1MI[wx(i)]

rescaling

R ^ B E R T = R B E R T − b 1 − b \hat{R}_{BERT} = \frac{R_{BERT} - b}{1-b} R^BERT=1bRBERTb

b: empirical lower bound, calculated using Common Crawl monolingual datasets

Comparison

machine translation evalution -> F B E R T F_{BERT} FBERT

text generation in Eglish -> 24-layer R o B E R T a l a r g e RoBERTa_{large} RoBERTalarge

non-English language -> B E R T m u l t i BERT_{multi} BERTmulti

BLEURT

paper: BLEURT: Learning Robust Metrics for Text Generation. ACL 2020

Architecture

Bert + Linear Head

pre-training scheme

random perturbations of Wikipedia sentences augmented with a diverse set of lexical and semantic-level supervision signals

  • mask-filling with BERT -> lexical alterations
  • backtranslation
  • randomly dropping out words -> to recognize void preditions and sentence truncation in NLG systems

pretraining metrics: weighted sum of previous metrics

BARTScore

paper: BARTSCORE: Evaluating Generated Text as Text Generation

ExplainaBoard:http://explainaboard.nlpedia.ai/leaderboard/task-meval/

explainaboard

evaluation perspectives:
  • Informativeness
  • Relevance
  • Fluency
  • Coherence
  • FActuality
  • Semantic Coverage
  • Adequacy
BARTScore

B A R T S C O R E = Σ t = 1 m ω t log ⁡ p ( y t ∣ y < t , x , θ ) BARTSCORE = \Sigma_{t=1}^{m} \omega_t \log p(y_t | y_{<t}, x, \theta) BARTSCORE=Σt=1mωtlogp(yty<t,x,θ)

using prompt to augment metrics

没太看明白,一开始列了一堆指标,最后又只有一个BARTScore。看了眼ExplainaBoard,猜测可能是评判的任务/输入数据对 { x , y } \{x,y\} {x,y}不同,BARTScore体现出的评判句子的方面就不一样

MoverScore

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance. link

MoverDistance

W M D ( x n , y n ) : = m i n F ∈ R ∣ x n ∣ × ∣ y n ∣ < C , F > , s . t . F 1 = f x n , F T 1 = f y n WMD(x^n, y^n) := \underset{F \in R^{|x^n| \times |y^n|}}{min} <C,F>, \quad s.t. F1 = f_{x^n}, F^T 1 = f_{y^n} WMD(xn,yn):=FRxn×ynmin<C,F>,s.t.F1=fxn,FT1=fyn

C i j = d ( x i n , y j n ) C_{ij} = d(x_i^n, y_j^n) Cij=d(xin,yjn), the distance between the i-th n-gram of x and the j-th n-gram of y

F F F: transportation flow matrix, F i j F_{ij} Fij denoting the amount of flow traveling from the ith n-gram x i n x_i^n xin in x n x^n xn to the j-th n-gram y j n y_j^n yjn in y n y^n yn.

< C , F > = s u m ( C ⊙ F ) <C,F> = sum(C \odot F) <C,F>=sum(CF)

d ( x i n , y j n ) d(x_i^n, y_j^n) d(xin,yjn) Euclidean distance

f x i n = 1 Z Σ k = i i + n − 1 i d f ( x k ) f_{x^n_i} = \frac{1}{Z} \Sigma_{k=i}^{i+n-1} idf(x_k) fxin=Z1Σk=ii+n1idf(xk)

vs BERTScore

bartscore vs bertscore

对于某个词,BERTScore算原句子中与它最相似的词的相似度(内积),而MoverScore算这个词和所有其他词的加权内积和,权重(即公式中的 F F F)通过idx算

Embedding Average

直接计算生成文本和参考文本中词向量的平均值作为文本的向量表示,然后计算两个文本的余弦相似度作为生成文本和参考文本的相似度:

e r ˉ = Σ ω ∈ r e ω ∣ Σ ω ′ ∈ r e ω ′ ∣ \bar{e_r} = \frac{\Sigma_{\omega \in r} e_{\omega}}{| \Sigma_{\omega ' \in r} e_{\omega '}|} erˉ=ΣωreωΣωreω
E A : = c o s ( e r ˉ , e r ^ ˉ ) EA := cos(\bar{e_r}, \bar{e_{\hat{r}}}) EA:=cos(erˉ,er^ˉ)

Perplexity

用的比较多,比较简单,但也受到句子各方面特征的影响,如长度,专有名词等。

p.s.

很多现有的评价生成的文本的指标是基于Machine Translation任务的,计算原句子和翻译句子的相似度/匹配度。

评价句子相似度的一些指标
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/986296
推荐阅读
相关标签
  

闽ICP备14008679号