当前位置:   article > 正文

pytorch实现bert_精细调整bert和roberta以在pytorch中实现高精度文本分类

roberta pytorch 分类 实现

pytorch实现bert

As of the time of writing this piece, state-of-the-art results on NLP and NLU tasks are obtained with Transformer models. There is a trend of performance improvement as models become deeper and larger, GPT 3 comes to mind. Training small versions of such models from scratch takes a significant amount of time, even with GPU. This problem can be solved via pre-training when a model is trained on a large text corpus using a high-performance cluster. Later it can be fine-tuned for a specific task in a much shorter amount of time. During fine tuning stage, additional layers can be added to the model for specific tasks, which can be different from those for which the model was initially trained. This technique is related to transfer learning, a concept applied to areas of machine learning beyond NLP (see here and here for a quick intro).

在撰写本文时,已使用Transformer模型获得了有关NLP和NLU任务的最新结果。 随着模型变得越来越深,性能越来越大, GPT 3浮现在脑海。 即使使用GPU,从头开始训练这类模型的小版本也要花费大量时间。 当使用高性能集群在大型文本语料库上训练模型时,可以通过预训练来解决此问题。 之后,可以在更短的时间内针对特定任务对其进行微调。 在微调阶段,可以为特定任务向模型添加其他层,这些层可以与最初训练模型时所用的层不同。 该技术与转移学习有关,转移学习是应用于NLP之外的机器学习领域的概念(快速入门请参见此处此处 )。

In this post, I would like to share my experience of fine-tuning BERT and RoBERTa, available from the transformers library by Hugging Face, for a document classification task. Both models share a transformer architecture, which consists of at least two distinct blocks — encoder and decoder. Both encoder and decoder consist of multiple layers based around Attention mechanism. Encoder processed the input token sequence into a vector of floating point numbers — a hidden state, which is picked up by the decoder. It is the h

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Monodyee/article/detail/249520
推荐阅读
相关标签
  

闽ICP备14008679号