当前位置:   article > 正文

大语言模型(LLM)发展历程及模型相关信息汇总(2023-07-12更新)_大语言模型发展历程

大语言模型发展历程

大语言模型(large language model,LLM)发展历程及模型相关信息汇总(2023-07-12更新

在这里插入图片描述
LLM发展时间轴:以下用表格形式汇总了从 BERT(2018-10-11)到 Baichuan(203-06-15)共计 58种语言大模型的相关信息:主要从 模型名称,发布时间,模型参数,发布机构,github/官网,发表论文7个维度进行统计。

排序模型名称发布时间模型参数发布机构GitHub/官网论文
57Baichuan-7B2023-06-1570亿百川智能github.com/baichuan-inc
56Aquila-7B2023-06-1070亿BAAIgithub.com/FlagAI-Open/
55Falcon2023-05-24400亿Technology Innovation Institutefalconllm.tii.ae/
54Guanaco2023-05-2370亿~650亿University of Washingtongithub.com/artidoro/qloQLORA: Efficient Finetuning of Quantized LLMs
53RWKV2023-05-2270亿RWKV Foundationgithub.com/BlinkDL/RWKVRWKV: Reinventing RNNs for the Transformer Era
52CodeT5+2023-05-13160亿Salesforcegithub.com/salesforce/CCodeT5+: Open Code Large Language Models for Code Understanding and Generation
51PaLM22023-05-1010亿~100亿Googleai.google/static/documePaLM 2 Technical Report
50RedPajamaINCITE2023-05-0528亿TOGETHERhuggingface.co/togetherReleasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
49MPT2023-05-0570亿MosaicMLgithub.com/mosaicml/llmIntroducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
48StarCoder2023-05-0570亿Hugging Facegithub.com/bigcode-projStar Coder: May the Source be With You!
47OpenLLaMa2023-05-0370亿Berkeley Artificial Intelligence Researchgithub.com/openlm-reseaOpenLLaMA: An Open Reproduction of LLaMA
46StableLM2023-04-2030亿&70亿Stability AIstability.ai/blog/stabiStability AI Launches the First of its StableLM Suite of Language Models
44Koala2023-04-03130亿Berkeley Artificial Intelligence Researchgithub.com/young-geng/EKoala: A Dialogue Model for Academic Research
43Vicuna-13B2023-03-31130亿LM-SYSgithub.com/lm-sys/FastCVicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
42BloombergGPT2023-03-30500亿Bloombergbloomberg.com/company/pBloombergGPT: A Large Language Model for Finance
41GPT4All2023-03-2970亿Nomic AIgithub.com/nomic-ai/gptGPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
40Dolly2023-03-2460亿Databrickshuggingface.co/databricHello Dolly: Democratizing the magic of ChatGPT with open models
39ChatGLM-6B2023-03-1462亿清华大学github.com/THUDM/ChatGLChatGLM-6B: An Open Bilingual Dialogue Language Model
38GPT-42023-03-14未知OpenAIcdn.openai.com/papers/gGPT-4 Technical Report
37StanfordAlpaca2023-03-1370亿Stanfordgithub.com/tatsu-lab/stAlpaca: A Strong, Replicable Instruction-Following Model
36LLaMA2023-02-2470亿~650亿Metagithub.com/facebookreseLLaMA: Open and Efficient Foundation Language Models
35GPT-3.52022-11-301750亿OpenAIplatform.openai.com/docGPT-3.5 Model
34BLOOM2022-11-091760亿BigSciencehuggingface.co/bigscienBLOOM: A 176B-Parameter Open-Access Multilingual Language Model
33BLOOMZ2022-11-031760亿BigSciencegithub.com/bigscience-wCrosslingual Generalization through Multitask Finetuning
32mT02022-11-03130亿BigSciencegithub.com/bigscience-wCrosslingual Generalization through Multitask Finetuning
31Flan-U-PaLM2022-10-205400亿Googlegithub.com/google-reseaScaling Instruction-Finetuned Language Models
30Flan-T52022-10-20110亿Googlegithub.com/google-reseaScaling Instruction-Finetuned Language Models
29WeLM2022-09-21100亿微信welm.weixin.qq.com/docsWeLM: A Well-Read Pre-trained Language Model for Chinese
28PLUG2022-09-01270亿阿里达摩院github.com/alibaba/AlicPLUG: Pre-training for Language Understanding and Generation
27OPT2022-05-021750亿Metagithub.com/facebookreseOPT: Open Pre-trained Transformer Language Models
26PaLM2022-04-055400亿Googlegithub.com/lucidrains/PPaLM: Scaling Language Modeling with Pathways
25Chinchilla2022-03-29700亿Google DeepMinddeepmind.com/blog/an-emTraining Compute-Optimal Large Language Models
24CodeGen2022-03-25160亿Salesforcegithub.com/salesforce/cCodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
23GLM-130B2022-03-171300亿清华大学github.com/THUDM/GLM-13GLM: General Language Model Pretraining with Autoregressive Blank Infilling
22InstructGPT2022-03-041750亿OpenAIgithub.com/openai/folloTraining Language Models to Follow Instructions with Human Feedback
21AlphaCode2022-02-08410亿Google DeepMinddeepmind.com/blog/compeCompetition-Level Code Generation with AlphaCode
20MT-NLG2022-01-285300亿Microsoftgithub.com/microsoft/DeUsing DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
19LaMDA2022-01-201370亿Googlegithub.com/conceptofminLaMDA: Language Models for Dialog Applications
18WebGPT2021-12-171750亿OpenAIopenai.com/research/webWebGPT: Browser-assisted question-answering with human feedback
17GLaM2021-12-1312000亿Googleai.googleblog.com/2021/GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
16Gopher2021-12-082800亿Google DeepMinddeepmind.com/blog/languScaling Language Models: Methods, Analysis & Insights from Training Gopher
15T02021-10-15110亿Hugging Facegithub.com/bigscience-wMultitask Prompted Training Enables Zero-Shot Task Generalization
14FLAN2021-09-031370亿Googlegithub.com/google-reseaFinetuned Language Models Are Zero-Shot Learners
13Codex2021-07-07120亿OpenAIgithub.com/openai/humanEvaluating large language models trained on code
12ERNIE3.02021-07-05100亿百度github.com/PaddlePaddleERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
11PanGu-Alpha2021-04-262000亿华为openi.pcl.ac.cn/PCL-PlaPanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
10SwitchTransformer2021-01-1116000亿Googlehuggingface.co/google/sSwitch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
9mT52020-10-22130亿Googlehuggingface.co/google/mmT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
8GShard2020-06-306000亿Googlearxiv.org/pdf/2006.1666GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
7GPT-32020-05-281750亿OpenAIgithub.com/openai/gpt-3Language Models are Few-Shot Learners
6Turing-NLG2020-02-13170亿Microsoftmicrosoft.com/en-us/resTuring-NLG: A 17-billion-parameter language model by Microsoft
5T52019-10-23110亿Googlegithub.com/google-reseaExploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
4XLNet2019-06-193.4亿Google Braingithub.com/zihangdai/xlXLNet: Generalized Autoregressive Pretraining for Language Understanding
3Baidu-ERNIE2019-04-193.4亿百度github.com/PaddlePaddleERNIE: Enhanced Representation through Knowledge Integration
2GPT-22019-02-1415亿OpenAIgithub.com/openai/gpt-2Language Models are Unsupervised Multitask Learners
1BERT2018-10-113.4亿Googlegithub.com/google-reseaBidirectional Encoder Representations from Transformers
0GPT-12018-06-111.17 亿OpenAIgithub.com/openai/finetImproving Language Understanding by Generative Pre-Training

其中具有代表性的节点作品:

-结合对齐和翻译的神经网络机器翻译模型

论文题目Neural Machine Translation by Jointly Learning to Align and Translate (2014)

论文解读论文笔记《Neural Machine Translation by Jointly Learning to Align and Translate》

这篇文章引入了一种注意力机制(attention mechanism),用于提升递归神经网络(RNN)的长序列建模能力。这使得 RNN 能够更准确地翻译更长的句子——这也是后来开发出原始 Transformer 模型的动机。

Transformer注意力机制

论文题目Attention Is All You Need (2017)

论文解读详解Transformer (Attention Is All You Need)

这篇论文介绍了原始 Transformer 模型的结构。该模型由编码器和解码器两部分组成,这两个部分在后续模型中分离成两个独立的模块。此外,该论文还引入了缩放点积注意力机制(Scaled Dot Product Attention Mechanism)、多头注意力机制(Multi-head Attention Blocks)和位置编码(Positional Input Encoding)等概念,这些概念仍然是现代 Transformer 系列模型的基础。

BERT: 语言理解的深度双向 Transformer 预训练

论文题目BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)

论文解读[详解] 一文读懂 BERT 模型

在原始的 Transformer 模型之后,大语言模型研究开始向两个方向分化:基于编码器结构的 Transformer 模型用于预测建模任务,例如文本分类;而基于解码器结构的 Transformer 模型用于生成建模任务,例如翻译、摘要和其他形式的文本内容生成。

GPT1:通过生成预训练改进语言理解

论文题目Improving Language Understanding by Generative Pre-Training (2018)

论文解读ChatGPT1论文解读《Improving Language Understanding by Generative Pre-Training》(2018)

在预训练阶段增加Transformer中间层可以显著提升效果;整个模型在12个数据集中的9个取得了更好的效果,说明该模型架构设计很不错,值得继续深入研究;辅助目标学习对于数据量越大的场景,可以越提升模型 的泛化能力。

GPT2:

论文题目:Language Models are Unsupervised Multitask Learners(2019)

GPT-2模型依旧使用Transformer模型的decoder,但相比于GPT-1,数据和模型参数变得更大,大约是之前的10倍,主打zero-shot任务。

GPT3:

论文题目:Language Models are Few-Shot Learners(2020)

论文解读:GPT-3阅读笔记:Language Models are Few-Shot Learners

GPT-3不再追求极致的zero-shot学习,即不给你任何样例去学习,而是利用少量样本去学习。因为人类也不是不看任何样例学习的,而是通过少量样例就能有效地举一反三。
由于GPT-3庞大的体量,在下游任务进行fine-tune的成本会很大。因此GPT-3作用到下游子任务时,不进行任何的梯度更新或fine-tune。

GPT4:生成式预训练变换模型

论文题目GPT-4 Technical Report(2023)

论文解读GPT-4大模型硬核解读,看完成半个专家

论文解读:GPT系列论文阅读笔记

整理数据来源于网上公开资源,如有不对之处请指正,谢谢。

参考:

1.关于 ChatGPT 必看的 10 篇论文

2.理解大语言模型–10篇论文的简明清单

3.GPT-4论文精读【论文精读·53】

4 .通向AGI之路:大型语言模型(LLM)技术精要

5.万字长文:LLM - 大语言模型发展简史

声明:本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号