赞
踩
LLMs之ERNIE 3.0/ERNIE 3.0 Titan:《ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读
《ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation》翻译与解读
导读:ERNIE 3.0框架可用于预训练处理语言理解和生成任务的知识增强大模型。它在40亿字语料库上进行训练,可用于零样本学习、少量学习和微调。多项任务实验结果证明ERNIE 3.0的有效性。基于ERNIE 3.0框架预训练了拥有2600亿参数的知识增强语言模型ERNIE 3.0 Titan。验证结果表明它取得了新领域的效果。此外,我们提出了一种新的方法来控制生成结果以及使结果与现实世界相一致。我们还设计了在线蒸馏框架,并进行了不同规模的蒸馏模型,考虑到大规模预训练模型的计算开销。
>> 实现NLU+NLG任务:ERNIE 3.0是百度发布的知识增强的预训练大模型,参数规模为10B。ERNIE 实现了兼顾自然语言理解和自然语言生成的统一预训练框架,使得经过训练的模型可以通过零样本学习、少样本学习或微调轻松地针对自然语言理解和生成任务进行定制。
>> ERNIE 3.0 Titan参数是GPT-3的1.5倍:ERNIE 3.0 Titan 是百度与鹏城实验室发布的目前为止全球最大的中文单体模型,它是ERNIE 3.0的扩大和升级,模型参数规模达到 260B,相对GPT-3的参数量提升50%。
>> 生成可信和可控的文本=自监督的对抗性损失+可控的语言建模损失:ERNIE 3.0 Titan 在预训练阶段还设计了一个自监督的对抗性损失和一个可控的语言建模损失,使 ERNIE 3.0 Titan 生成可信和可控的文本(Credible and Controllable Generations)。
>> 在线蒸馏框架来降低计算开销:为了减少计算开销,ERNIE 3.0 Titan 提出了一个在线蒸馏框架,教师模型将同时教授学生模型和训练自己以更高效地利用计算资源。ERNIE 3.0 Titan 在 68 个 NLP 数据集上的表现优于最先进的模型。
目录
地址 | ERNIE 3.0:https://arxiv.org/abs/2107.02137 |
时间 | 2021年7月5日 |
作者 | 百度研究团队 |
Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%). | 预训练模型在各种自然语言处理(NLP)任务中取得了最先进的成果。最近的作品,如T5和GPT-3,表明扩大预训练语言模型可以提高它们的泛化能力。特别是,拥有1750亿参数的GPT-3模型展示了其强大的任务无关的零样本/少样本学习能力。尽管取得了成功,但这些大规模模型是在纯文本上进行训练的,没有引入语言知识和世界知识等知识。此外,大多数大规模模型都是以自回归方式进行训练。因此,这种传统的微调方法在解决下游语言理解任务时表现相对较弱。为了解决上述问题,我们提出了一个名为ERNIE 3.0的统一框架,用于预训练大规模知识增强模型。它融合了自回归网络和自编码网络,使得训练的模型可以轻松适应自然语言理解和生成任务,实现零样本学习、少样本学习或微调。我们使用了100亿参数在一个由纯文本和大规模知识图构成的4TB语料库上训练了该模型。实证结果显示,该模型在54个中文NLP任务中胜过了最先进的模型,而其英文版本在SuperGLUE基准测试中排名第一(2021年7月3日),超过人类表现0.8%(90.6% vs. 89.8%)。 |
6 Conclusion we proposed the ERNIE 3.0 framework to pre-train a knowledge enhanced 10-billion parameter model on a 4TB corpus including plain texts and a knowledge graph. In order to handle both language understanding and generation tasks with zero-shot learning, few-shot learning and fine-tuning, ERNIE 3.0 designs a unified pre-training framework that integrates both auto-encoder networks and auto-regressive networks. We construct extensive experiments on various datasets from different task paradigms and fields, and the results demonstrate the effectiveness of ERNIE 3.0 as compared to the previous state-of-the-art pre-trained models. | 6 结论 我们提出了ERNIE 3.0框架,使用一个4TB的语料库(包括纯文本和知识图)对一个具备100亿参数的知识增强模型进行了预训练。为了处理零样本学习、少样本学习和微调等语言理解和生成任务,ERNIE 3.0设计了一个统一的预训练框架,将自编码器网络和自回归网络相结合。我们在不同任务范例和领域的各种数据集上进行了大量实验,结果与之前最先进的预训练模型相比,证明了ERNIE 3.0的有效性。 |
地址 | ERNIE 3.0 Titan:https://arxiv.org/abs/2112.12731 |
时间 | 2021年12月23日 |
作者 | 百度研究团队 |
Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. Furthermore, we design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts. To reduce the computation overhead and carbon emission, we propose an online distillation framework for ERNIE 3.0 Titan, where the teacher model will teach students and train itself simultaneously. ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets. | 预训练语言模型在各种自然语言处理(NLP)任务中取得了最先进的结果。GPT-3展示了通过扩大预训练语言模型可以进一步发挥其巨大潜力。最近提出了一个名为ERNIE 3.0的统一框架,用于预训练大规模知识增强模型,并训练了一个拥有100亿参数的模型。ERNIE 3.0在各种NLP任务中表现优于最先进的模型。为了探索扩大ERNIE 3.0的性能,我们在PaddlePaddle平台上训练了一个拥有2600亿参数的名为ERNIE 3.0 Titan的百亿参数模型。此外,我们设计了一个自监督对抗损失和可控的语言建模损失,使ERNIE 3.0 Titan能够生成可信且可控的文本。为了减少计算开销和碳排放,我们提出了一个在线蒸馏框架,用于ERNIE 3.0 Titan,其中教师模型将同时教导学生并自我训练。ERNIE 3.0 Titan是迄今为止最大的中文密集预训练模型。实证结果显示,ERNIE 3.0 Titan在68个NLP数据集上胜过了最先进的模型。 |
7 Conclusion We pre-train a knowledge-enhanced language model with 260 billion parameters named ERNIE 3.0 Titan based on the ERNIE 3.0 framework. It is the largest Chinese dense pre-training model as far as we know. We have validated it on 68 datasets, and the results show that ERNIE 3.0 Titan achieves new state-of-the-art results. In addition, We propose a novel method for users to control the generation result and obtain the result factually consistent with the real world. We also devise an online distillation framework and conduct several distilled models of different sizes concerning the computation overhead of large-scale pre-training models. In the next stage, we will continually update ERNIE 3.0 Titan with more data to further explore the limit of the performance of large-scale pre-trained language models. We will also endeavor to explore the potential of knowledge-enhanced large-scale multi-modal models for more and various tasks. | 7 结论 基于ERNIE 3.0框架,我们预训练了一个拥有2600亿参数的知识增强语言模型,命名为ERNIE 3.0 Titan。据我们所知,这是迄今为止最大的中文密集预训练模型。我们在68个数据集上进行了验证,结果显示ERNIE 3.0 Titan取得了最新的最先进结果。此外,我们提出了一种新颖的方法,使用户可以控制生成结果,并获得与真实世界一致的结果。我们还设计了一个在线蒸馏框架,并根据大规模预训练模型的计算开销制定了几种不同规模的蒸馏模型。在接下来的阶段,我们将持续使用更多数据更新ERNIE 3.0 Titan,进一步探索大规模预训练语言模型性能的极限。我们还将努力探索知识增强的大规模多模态模型在更多和不同任务中的潜力。 |
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。