BERT (Devlin et. al.) is a pioneering Language Model that is pretrained for a Denoising Autoencoding objective to produce state of the art results in many NLP tasks. However, there is still room for improvement in the original BERT model w.r.t its pretraining objectives, the data on which it is trained, the duration for which it is trained, etc. These issues were identified by Facebook AI Research (FAIR), and hence, they proposed an ‘optimized’ and ‘robust’ version of BERT.
BERT( Devlin等人 )是一种开创性的语言模型,已针对去噪自动编码目标进行了预训练,以在许多NLP任务中产生最先进的结果。 但是,原始BERT模型的预训练目标,训练数据,训练持续时间等仍然有待改进。这些问题已由Facebook AI Research(FAIR)确定 ,因此,他们提出了BERT的“ 优化 ”和“ 健壮 ”版本。
In this article we’ll be discussing RoBERTa: Robustly Optimized BERT-Pretraining Approach proposed in Liu et. al. which is an extension to the original BERT model. The prerequisite for this article would be general awareness about BERT’s architecture, pretraining and fine-tuning objectives, which by default includes sufficient awareness about the Transformer model (Vaswani et. al.).
在本文中,我们将讨论罗伯塔,R obustlyØptimized BERT - P 再培训接近角提出了刘等。 等 这是对原始BERT模型的扩展。 本文的先决条件是对BERT的体系结构,预训练和微调目标有全面的了解,默认情况下,其中包括对Transformer模型的充分了解( Vaswani等人 )。
I have already covered Transformers in this article; and BERT in this article. Consider giving them a read if you’re interested.
我已经在本文中介绍了《 变形金刚》。 和BERT在本文中 。 如果您有兴趣,可以考虑给他们阅读。
罗伯塔 (RoBERTa)
If I were to summar