赞
踩
we also propose a new pre-trained model called MacBERT, which replaces the original MLM task into MLM as correction (Mac) task and mitigate the discrepancy of pre-training and fine-tuning stage.
我们提出新的预训练模型为 MacBERT, MacBERT替换bert原始的MLM任务为Mac任务,并减少预训练和微调阶段的差异。
The contributions of this paper arelisted as follows.(论文的贡献。)
BERT consists of two pre-training tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP).(bert包含两种任务:MLM 和 NSP)
MLM: Randomly masks some of the tokens from the input, and the objective is to predict the original word based only on its context.(随机mask一些输入的token,目标是:通过的mask上下文的token预测mask的单词。)
NSP: To predict whether sentence B is the next sentence of A.(预测B是否是A的下一个语句。)
Later, they further proposed a technique called Whole Word Masking (WWM) for optimizing the original masking in the MLM task.(进一步的通过全词mask的方式优化MLM任务中的mask方式。)In this setting,instead of randomly selecting WordPiece tokens to mask, we always mask all of the tokens corresponding to a whole word at once.(在这种策略下,代替随机mask WordPoece token,mask整个词相关的WordPoece token。)
Enhanced Representation through kNowledge IntEgration (ERNIE) is designed to optimize the masking process of BERT, which includes entity-level masking and phrase-level masking. (ERNIE设计优化bert的mask策略,包含实体水平的mask和短语水平的mask。)Different from selecting random words in the input, entity-level masking will mask the named entities, which are often formed by several words. (与输入随机选择词不同,实体水平的mask将mask命名实体的词,有可能是几个词。)Phrase-level masking is to mask consecutive words, which is similar to the N-gram masking strategy.(短语水平的mask是mask连续的几个词,与ngram mask策略相似的。)
To alleviate this problem, they proposed XLNet, which was based onTransformer-XL.(为了缓解bert预训练和微调阶段的误差,提出了XLNET,它是基于Transformer-XL。)XLNet mainly modifies in two ways. The first is to maximize the expected likelihood over all permutations of the factorization order of the input, where they called the Permutation Language Model (PLM). Another is to change the autoencoding language model into an autoregressive one, which is similar to the traditional statistical language models.(XLNET主要做了两方面的改变。首先,其次,最大化所有输入排列的对数似然,它成为重排列语言建模PLM)。XLNET改变自编码语言模型为自回归语言模型,它是与传统的统计语言模型相似的。)
We use the traditional Chinese Word Segmentation (CWS) tool to split the text into several words.(我么采用中文分词工具切分文本为一类列词序列。)In this way, we could adopt whole word masking in Chinese to mask the word instead of individual Chinese characters.(在这种方式下,中文语料中我们采用整词mask替代单个字。)MacBERT remains the same pre-training tasks as BERT with several modifications.For the MLM task, we perform the following modifications .(MacBERT任然保持与bert相同的预训练任务,仅做几方面的修改。对于MLM任务,我们做以下修改。)We use LTP (Che et al., 2010) for Chinese word segmentation to identify the word boundaries.(我们使用LTP对于中文分词识别词的边界。)Note that the whole word masking only affects the selection of the masking tokens in the pre-training stage.(注意:整词mask仅影响预训练阶段mask token的选择。)The input of BERT still uses WordPiece tokenizer to split the text, which is identical to the original BERT.(Bert的输入仍然使用WordPiece切分文本,这与原始的bert相同。 )
For the MLM task, we perform the following modifications .(对于MLM任务,仅做以下修改。)
MacBERT 在bert的各个改进版本之上的,持续优化训练的策略:(1)对于MLM任务,使用全词mask和ngram mask策略,mask的词使用相似词替换,减少训练和微调之间的误差:(2)使用albert的SOP任务;在下游的任务中,比之前的bert版本效果均要好。
[1]: http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference
[2]: https://mermaidjs.github.io/
[3]: https://mermaidjs.github.io/
[4]: https://arxiv.org/pdf/1907.11692.pdf*
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。