当前位置:   article > 正文

LLM的理论古往今来(持续更新ing...)_head-to-tail: how knowledgeable are large language

head-to-tail: how knowledgeable are large language models (llm)? a.k.a. will

诸神缄默不语-个人CSDN博文目录

要真说追是很难追上的,反正就……
作为一个笔记集锦。

文中的年份并不严格与内容对应,尤其在跨年的情况下。

数值推理、序列标注/信息抽取相关论文不写在本篇。
在我写prompt技巧的博文(prompt工程(持续更新ing…))中出现过的论文也不在本篇中再度赘述。

2024年

  1. 评估
    1. (AAAI) Avoiding Data Contamination in Language Model Evaluation: Dynamic Test Construction with Latest Materials:LatestEval评估方法,利用最新文本避免数据污染
  2. 长文本
    1. Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
  3. (印度理工学院) Large Language Models aren’t all that you need:这篇就是说在传统NER方法上叠加trick其实能超越LLM
  4. (谷歌) LLM Augmented LLMs: Expanding Capabilities through Composition:LLM helps LLM。通过cross attention组合LLM

2023年

  1. 综述
    1. (人大高瓴人工智能学院)《A Survey of Large Language Models》及其中文版《大语言模型综述》
      A Survey of Large Language Models
      https://github.com/RUCAIBox/LLMSurvey/blob/main/assets/LLM_Survey_Chinese_0418.pdf
    2. Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
    3. On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models
    4. (华为) Aligning Large Language Models with Human: A Survey
    5. A Survey on Evaluation of Large Language Models
    6. A Survey on Multimodal Large Language Models
    7. A Comprehensive Overview of Large Language Models
    8. Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents
    9. The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
    10. Data Management For Large Language Models: A Survey
    11. (ACM Computing Surveys) Survey of Hallucination in Natural Language Generation
  2. 集锦
    1. Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4
  3. 教程
    1. (ACL) Retrieval-based Language Models and Applications
  4. The Curse of Recursion: Training on Generated Data Makes Models Forget:(第一个版本的标题比较劲爆)大意就是说用LLM生成的数据再训练LLM会使LLM效果越来越烂
  5. Intelligence Primer
  6. 长文本
    1. Blockwise Parallel Transformer (BPT):Blockwise Parallel Transformer for Long Context Large Models
    2. The Impact of Positional Encoding on Length Generalization in Transformers:比较了不同PE的长度泛化效果
    3. (微软)LongNet: Scaling Transformers to 1,000,000,000 Tokens:线性计算复杂性,dilated attention(注意力随距离增加指数下降)
    4. LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
    5. 综述:Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
    6. (2023 TACL) Lost in the Middle: How Language Models Use Long Contexts
  7. (2023 ACL) Sen2Pro: A Probabilistic Perspective to Sentence Embedding from Pre-trained Language Model:将句子表征为向量空间的probability density distribution。这个方法不用retrain,可以直接插到LLM上
  8. 为什么现在的LLM都是Decoder only的架构? - 知乎
  9. transformer中的attention为什么scaled? - 知乎
  10. 模型蒸馏
    1. Knowledge Distillation of Large Language Models
  11. 模型量化
    1. OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
  12. ChatGPT大家族系列
    1. 黑客 George Hotz 爆料 GPT-4 由 8 个 MoE 模型组成,真的吗? - 知乎:听OpenAI讲,这就是大力出奇迹!
    2. Let’s Verify Step by Step
  13. ICL
    1. A Survey on In-context Learning
    2. Pre-Training to Learn in Context
    3. (ACL Findings) Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
      1. 讲解博文:清北微软深挖GPT,把上下文学习整明白了!和微调基本一致,只是参数没变而已
    4. (EMNLP) Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
  14. 微调
    1. Symbol tuning improves in-context learning in language models
    2. 优化器
      1. (微软) DeepSpeed
      2. DeepSpeedExamples/applications/DeepSpeed-Chat at master · microsoft/DeepSpeedExamples
    3. MiniChain
    4. (ICLR) Self-Consistency Improves Chain of Thought Reasoning in Language Models
      从语言模型中对不同的输出集(利用采样生成多种不同的推理路径)进行采样,并返回集合中最一致的答案→其实是集成方法,代替CoT的贪心策略
      ReadPaper AI讲解论文:https://readpaper.com/paper/666189129860624384
    5. RLHF
      1. (ICLR) Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
    6. CoT论文列表:https://github.com/Timothyxxx/Chain-of-ThoughtsPapers
    7. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning:prompt集合
    8. (ACL) Self-Instruct: Aligning Language Models with Self-Generated Instructions:bootstraping扩展指令
    9. Exploring Format Consistency for Instruction Tuning:数据集间指令格式统一
    10. Instruction Tuning for Large Language Models: A Survey
    11. Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying
    12. (EACL) DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
    13. Full Parameter Fine-tuning for Large Language Models with Limited Resources:LOMO方法
  15. 检索+生成
    1. (ICLR) Generate rather than Retrieve: Large Language Models are Strong Context Generators:认为以前的工作都是先检索后阅读,本文先生成上下文文档,再阅读生成的文档,生成最终答案
      基于聚类的prompt方法:选择不同的prompt,以使生成的文档涵盖不同的视角,从而提高recall
      ReadPaper AI讲解论文:https://readpaper.com/paper/4670624394054221825
    2. Retrieval meets Long Context Large Language Models
    3. Model-enhanced Vector Index
    4. Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models:思路是如果检索到不相关的信息可能会误导LLM,此外LLM本身可能不知道它自身与知识库能提供的知识够不够回答问题。本文提出CON方法来提高RAG的鲁棒性:为检索到的文档生成连续阅读笔记,评估其与问题的相关性,整合生成答案
    5. Learning to Filter Context for Retrieval-Augmented Generation
    6. (苹果) Context Tuning for Retrieval Augmented Generation
    7. Large Language Models for Information Retrieval: A Survey
    8. RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models:高质量语料,包含LLM+RAG的自动回复,标注了幻觉强度等
    9. (ICLR) Quantifying Memorization Across Neural Language Models
  16. (ICLR) APE
    Large Language Models Are Human-Level Prompt Engineers:总之就是用LLM自己写prompt
  17. (普林斯顿) InstructEval: Systematic Evaluation of Instruction Selection Methods:评估prompt
    参考阅读博文:放弃评测大模型,普林斯顿大学已经开始评估Prompt了,提出Prompt评估框架
  18. TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks
    参考阅读博文:GPT-4使用效果不好?美国奥本大学提出Prompt分类法,另辟蹊径构建Prompt设计指南
  19. 推荐系统
    1. Is ChatGPT a Good Recommender? A Preliminary Study
    2. 新闻推荐
      1. (SIGIR) Prompt4NR
        Prompt Learning for News Recommendation
  20. LLM+KG
    1. KGQA
      1. KAPING
        Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering
  21. 可靠性1
    1. Measuring Faithfulness in Chain-of-Thought Reasoning:衡量CoT推理的忠实性
      发现有时模型会忽略生成的推理,就硬是照着自己想的来说(模型越大越容易发生这种事)
      解决方案是选择合适的模型
    2. Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
      给出了解决方案:将问题分解为子问题,并分别解答
      在这里插入图片描述
    3. (Meta) Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?:认为LLM不能替代LM,因为它不可靠,而且这不可靠很难优化(就常规优化LLM的方法很难解决不可靠问题本身)。
    4. 幻觉问题的综述:
      1. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
      2. A Survey of Hallucination in Large Foundation Models
      3. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment
    5. Fine-tuning Language Models for Factuality:评估文本事实真实性,微调语言模型使其产生较少的事实性错误
  22. 评测
    (微软) A Survey on Evaluation of Large Language Models
    (天大) Evaluating Large Language Models: A Comprehensive Survey
    GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
    API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
  23. Toolformer
    Toolformer: Language Models Can Teach Themselves to Use Tools:LLM自学使用小工具
    Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Dataset Augmented by ChatGPT
  24. Large Language Models as Optimizers:用自然语言描述优化任务(这个提示也可以优化),让LLM自己优化,一步一步迭代
  25. agent
    1. A Survey on Large Language Model based Autonomous Agents
    2. The Rise and Potential of Large Language Model Based Agents: A Survey
  26. Detecting Pretraining Data from Large Language Models:检测某段文本是否被LLM用于预训练
  27. (Meta) The ART of LLM Refinement: Ask, Refine, and Trust
  28. 可解释性:(Anthropic) Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
  29. 模型加速:Accelerating Generative AI with PyTorch II: GPT, Fast | PyTorch
  30. LM-Cocktail: Resilient Tuning of Language Models via Model Merging:这篇的思路是微调模型会出现灾难性遗忘问题,所以本文将base模型和微调模型的参数加权平均,发现效果很好。但是相关fuse工作这样的结论早就有了,supermario(阿里)的就是直接求平均
  31. Contrastive Chain-of-Thought Prompting:idea被scoop原来就是这样的感觉吗(喃喃)
  32. Ghostbuster: Detecting Text Ghostwritten by Large Language Models:检索LLM生成的文本
  33. Scalable Extraction of Training Data from (Production) Language Models:鼓励ChatGPT泄漏训练数据
  34. Prompt Engineering a Prompt Engineer
  35. (微软) Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine:这一篇是用提示工程在专业领域上解决任务的
  36. SparQ Attention: Bandwidth-Efficient LLM Inference:推理加速
  37. Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
  38. Why “classic” Transformers are shallow and how to make them go deep
  39. MoDS: Model-oriented Data Selection for Instruction Tuning
  40. Are Emergent Abilities of Large Language Models a Mirage?
  41. Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models
  42. Improving Text Embeddings with Large Language Models

2022年

  1. prompt
    1. (ACM Computing Surveys) Re33:读论文 Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Languag
    2. (CHI LBW) PromptChainer: Chaining Large Language Model Prompts through Visual Programming:这个是将多个LLM链接(也就是先过一个LLM,然后再过一个LLM……etc)的可视化编程工具
      在这里插入图片描述
    3. LangChain
    4. LLamaIndex
  2. 预训练 / 微调策略
    1. (ICLR) LoRA: Low-Rank Adaptation of Large Language Models
    2. RLHF / instruction following / 指令对齐
      (NeurIPS) Training language models to follow instructions with human feedback
      官方博文:Aligning language models to follow instructions
    3. Scaling Instruction-Finetuned Language Models
    4. instruction-based finetuning / instruction tuning / FLAN:通过instruction(可以理解成“简介”之类的)描述一堆数据集,LM在上面微调,然后LM就能在别的用instruction描述的数据集上面的表现效果也有所提升。就提升LM的零样本学习效果(拿来做实验用的是LaMDA模型)
      论文:(ICLR 谷歌) Finetuned Language Models Are Zero-Shot Learners
      在这里插入图片描述
      在这里插入图片描述
  3. 检索
    1. (DeepMind) RETRO
      Improving language models by retrieving from trillions of tokens:检索文档chunk,调节AR语言模型。
      用BERT将token数据库嵌入chunk中
      chunked cross-attention mechanism
  4. 涌现能力:(TMLR) Emergent Abilities of Large Language Models

2021年

  1. prefix-tuning
    1. (ACL) Prefix-Tuning: Optimizing Continuous Prompts for Generation
      1. 官方讲解视频:Underline | Prefix-Tuning: Optimizing Continuous Prompts for Generation
  2. LoRA: Low-Rank Adaptation of Large Language Models
    1. GitHub:microsoft/LoRA: Code for loralib, an implementation of “LoRA: Low-Rank Adaptation of Large Language Models”
  3. prompt
    1. thunlp/OpenPrompt: An Open-Source Framework for Prompt-Learning.:支持transformers模型
    2. (ACL) Making Pre-trained Language Models Better Few-shot Learners
      参考学习博文:【综述】Prompting: 更好地将语言模型应用到NLP任务
  4. (EACL) Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries
    关注LLM中的知识如何表示,以及LLM的知识存量(和LLM规模基本线性增长)
  5. (EMNLP Findings) Retrieval Augmentation Reduces Hallucination in Conversation:用检索解决幻觉问题

2020年

  1. (ACL) Re26:读论文 Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks:大致是说LLM在通用域上预训练之后,在领域和任务上再继续预训练会提升模型效果
  2. Adapter
    1. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
    2. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters:解决注入知识的灾难性遗忘问题,用adapter实现
  3. RAG
    1. (NeurIPS) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
  4. Scaling Laws for Neural Language Models

2019年

  1. Adapter:怎么说呢感觉就是在LLM里新加了一个模块,然后只调这个模块,不调全局参数。
    这样使得微调效率upup
    1. 论文:(ICML) Parameter-Efficient Transfer Learning for NLP adapters
    2. 介绍博文:NLP中的Adapters是什么? | Finisky Garden
    3. AdapterHub - 572 adapters for 76 text tasks and 50 languages
  2. LAnguage Model Analysis (LAMA)
    1. 论文:(2019 EMNLP) Language Models as Knowledge Bases?:探查语言模型中的关系知识。具体的解决方案就是将问题作为填空题给出,让模型计算mask位置的结果。结果是LLM做得还不错(我感觉如果用新的LLM可能会效果更好)
  3. (NAACL) Linguistic Knowledge and Transferability of Contextual Representations:对比了BERT / ELMo / GPT(文中叫OpenAI transformer),主要探讨了语言模型编码了什么特征和不同层、不同预训练任务的transferibility问题
    在这里插入图片描述
    (这篇论文的实验也太多了,就算以现在的目光来看也只能感慨真是有钱有时间!)
    参考讲解博文:ELMo/GPT/BERT对比 - 知乎
  4. (ACL 香侬科技) Is Word Segmentation Necessary for Deep Learning of Chinese Representations?:得到了字几乎总是优于词的结论(苏神认为这篇的实验不足,这篇实验对比字嵌入和词嵌入都用得是随机初始化的嵌入矩阵,这导致以词为单位的模型嵌入层参数更多,容易过拟合。但事实上一般用的都是预训练好的词向量,但这篇文章里没有关注到这个场景。苏神后来做的WoBERT认为以词为单位其实效果更好)
  5. (ACL) Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling

2018年

  1. (EMNLP) Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures:结合知识图

2016年

  1. A Neural Knowledge Language Model:将知识图提供的符号知识与 RNN 语言模型相结合。通过预测要生成的单词是否具有潜在事实,模型可以通过复制预测事实的描述来生成此类与知识相关的单词
  2. Reference-Aware Language Models:参考外部信息实现LM

参考资料

  1. 我还没读的集成类内容:
    1. thunlp/PromptPapers: Must-read papers on prompt-based tuning for pre-trained language models.
    2. KSESEU/LLMPapers: Papers & Works for large languange models (ChatGPT, GPT-3, Codex etc.).

  1. Measuring and Improving the Faithfulness of Model-Generated Reasoning — LessWrong ↩︎

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/IT小白/article/detail/375078
推荐阅读
相关标签
  

闽ICP备14008679号