人工智能uu

这个屌丝很懒，什么也没留下！

热门标签

LLMs之RAG：《Retrieval-Augmented Generation for Large Language Models: A Survey大型语言模型的检索增强生成研究综述》翻译与解读_retrieval-augmented genera tion

作者：人工智能uu | 2024-08-18 15:56:43

踩

retrieval-augmented genera tion

LLMs之RAG：《Retrieval-Augmented Generation for Large Language Models: A Survey大型语言模型的检索增强生成研究综述》翻译与解读

导读：这篇论文主要围绕信息检索增强生成(Retrieval Augmented Generation，简称RAG)技术进行概述和分析。

背景痛点：

>> 大语言模型(LLM)在处理知识密集型任务和回答离线知识更丰富的问题时面临难题，例如产生错误信息或过时信息等问题。

>> 往往需要对LLM进行定制化训练，才能适应不同场景下的应用，这对开发人员和研究人员来说难度很大。

RAG技术的核心思想和解决方案：RAG通过将外部知识库中的信息检索成果整合到LLM的输入context中，从而增强LLM处理知识型任务和产生更准确答案的能力。

RAG技术发展趋势：

>> 从初级RAG到高级RAG，再到模块化RAG，不断优化框架结构。

>> 结合信息检索、生成和增强不同技术模块，形成完整流程。

>> 利用结构化和非结构化数据、LLM产生的内容等不同来源进行信息增强。

>> 探索迭代检索、递归检索、自适应检索等方法来优化检索过程。

>> 将RAG技术应用和整合到定制训练中，实现LLM优化的多种方式结合。

RAG技术的优势：

>> 无需重新训练LLM即可将外部新知识整合到模型中，更轻松地应对需求变化。

>> 借助外部知识库，LLM产出的答案更加准确、相关，能更好解决知识型问题。

>> RAG框架性能不断提高，且可扩展到图像、语音等多模态信息处理。

综上，RAG技术通过有效结合LLM与外部知识，在保留LLM优点的同时弥补其知识不足的缺陷，为LLM应用于生产环境提供一条良好的路径。

《Retrieval-Augmented Generation for Large Language Models: A Survey大型语言模型的检索增强生成研究综述》翻译与解读

Abstract

RAG诞生的背景痛点：幻觉、过时的知识和不透明、不可追溯的推理过程等挑战

论文综述：

RAG三范式，初级RAG→高级RAG→模块化RAG

RAG三基础，检索技术、生成技术、增强技术

1 Introduction引言

LLM显著成功但存在明显局限性→RAG成为有希望的解决方案

特定领域或高度专业化查询→易产生不正确信息或幻觉→不适合超出训练数据范围或需最新信息场景→生产环境中部署的不切实际性

RAG的诞生→目前已成为，改进聊天机器人的关键技术

RAG进化的四阶段：2017年的Transformer(侧重优化预训练)→RAG的休眠期→2022年的ChatGPT(侧重推理)→2023年的GPT4(侧重混合方法)

本调查旨在概述整个RAG过程

贡献：范式三大演变、三大核心技术、RAG评估框架、未来方向

论文结构：定义及其发展过程、核心组件、评估体系、优化方法比较

Figure 1: Technology tree of RAG research development featuring representative works

2 Definition

通过样例来理解RAG工作流

RAG三内容涉及的创新方法：检索什么→何时检索→→如何使用“检索到的数据”

RAG的工作范式(协同)、主要优势(无需再训练+高实用性+低入门门槛)→最受欢迎的架构之一、对话产品的底座

工作流三步骤：语料库离散块+编码器模型构建向量索引→根据相似度执行检索块→集成上下文信息

3 RAG Framework

Figure 3: Comparison between the three paradigms of RAG

3.1 Naive RAG初级RAG

(1)、索引：数据统一为标准文本→分割成块→嵌入模型向量化→创建索引(键值对的形式存储)

(2)、检索：接受用户查询→编码为向量表示→计算相似度并取前k个→作为上下文基础

Figure 2: A representative instance of the RAG process applied to question answering

(3)、生成：合成为连贯的提示→大模型生成响应→可多轮对话交互

(4)、Naive RAG的缺点

检索质量(低精度+低召回率)、生成质量(幻觉/不相关+毒性/偏见)、增强过程(分散/不连贯+冗余/重复，需有效整合)

多个检索段的重要性和相关性的辨别：重要性、相关性、一致性

过度依赖于增强信息：仅重复检索到的内容+没有新价值

3.2 Advanced RAG高级RAG：针对性的增强

(1)、预检索过程：优化数据索引，提高索引内容的质量，包含以下五大策略

增强数据粒度：比如删除无关信息、消除实体和术语的模糊性、确认事实准确性、保持上下文和更新过时的文档

优化索引结构：比如调整块的大小、跨多个索引路径、图数据索引

添加元数据：用来过滤，比如日期和目的、引用的章节和小节等元数据

对齐优化：比如引入“假设性问题”等方式

混合检索：

(2)、检索阶段：计算查询和块之间的相似性

微调嵌入：处理不断演变或罕见术语，如BAAI开发的BGE-large-EN

动态嵌入：根据周围单词的不同而具有不同的嵌入，比如BERT、OpenAI的embeddings-ada-02、GPT-4

(3)、后检索过程：由于LLM的输入存在上下文窗口限制等，需额外处理检索到的相关文件

重新排序：比如LlamaIndex、LangChai和HayStack等框架，比如Diversity Ranker(根据文档多样性)、LostInTheMiddleRanker(交替策略)、cohereAI rerank/big -rerank/LongLLMLingua(重计算语义相似度)

3.3 Modular RAG模块化RAG：更大的通用性和灵活性

(1)、新模块：各种新模块

搜索模块：针对特定场景的定制，

记忆模块：利用LLM的记忆能力来指导检索，比如Selfmem(使用自己的输出来改进自己)

融合：对原始查询(显式意图)和扩展查询(隐式意图)进行并行向量搜索，比如RAG-Fusion(多查询方法)

路由：多样化数据源→选择适当的数据存储存储→查询路由的决策源自预定义→LLM调用执行

预测：利用LLM生成必要的上下文

任务适配器：侧重于使RAG适应各种下游任务，比如UPRISE(检索零样本任务输入的提示)、PROMPTAGA-TOR(利用LLM作为少样本查询生成器)

(2)、新模式，两种组织范式

初级RAG和高级RAG都可以认为是由一些固定的模块组成

1)、添加或替换模块：保持核心结构，同时集成附加模块，比如RRR(引入了重写-检索-阅读过程)、Generate-Read(LLM的生成模块取代了检索模块)、Recite-Read(将外部检索转化为从模型权重中检索)

2)、调整模块间的流程：侧重加强语言模型和检索模型之间的交互，比如DSP(将上下文学习系统视为一个明确的程序)、ITER-RETGEN(使用一个模块的输出来改进另一个模块)

(3)、RAG管道优化：提供检索的效率和质量，五大策略

混合搜索探索：常用的有基于关键字的搜索、语义搜索和向量搜索

递归检索和查询引擎：两步检索方法，先小块(捕获语义)再大块(更多上下文)

退步提示法：比如使用反向提示

子查询：比如LlamaIndex，比如树查询、向量查询、顺序查询

基于假设的文档嵌入：比如HyDE(生成的答案在嵌入空间中可能更接近于直接查询)，效果比较差

4 Retrieval检索

4.1 Enhancing Semantic Representations增强语义表示

S1、块优化(最佳确定块大小)

分块策略：没有一种“最佳”策略适用于所有情况，只有对特定上下文最合适的策略，比如sentence-transformer(适合单个句子)、text-embedding-ada-002(适合256或512个标记的块)

滑动窗口技术(过合并跨多个块+分层检索)、small2big(初始小块+随后大块)

摘要嵌入技术(基于文档摘要)、元数据过滤技术(利用文档元数据增强过滤过程)、图索引技术(适合多跳问题场景)

S2、微调嵌入模型(将块和查询嵌入到语义空间)：两个主要范式

优秀的嵌入模型：比如AngIE、Voyage、BGE等，但特定领域受到制

T1、领域知识微调：三大挑战(数据集构建、模型微调、评估阶段)，比如LlamaIndex框架可以增强嵌入模型微调工作流程

T2、对下游任务进行微调，比如PROMPTAGATOR(利用LLM作为少样本查询生成器，适合数据稀缺)、LLM-Embedder(利用LLM生成奖励信号+采用双信号策略)

4.2 Aligning Queries and Documents对齐查询和文档：两种对齐技术

痛点：用户的原始查询可能存在措辞不准确和缺乏语义信息的问题

T1、查询重写(对齐查询和文档间的语义)：，比如Query2Doc/ITER-RETGEN(创建伪文档)、HyDE(生成“假设”文档)、RRR(颠倒了传统的检索和阅读顺序)、STEP-BACKPROMPTING

T2、嵌入转换：比如LlamaIndex(引入一个适配器模块)

4.3 Aligning Retriever and LLM对齐检索器和LLM

痛点：RAG管道提高检索命中率不一定会改善最终结果，目的旨在将检索器的输出与LLMs的偏好对齐

T1、微调增强/微调检索器(提高协同作用)：比如AAR(引入监视信号)、REPLUG(使用LM作为监督信号)、UPRISE(使用冻结的LLMs来微调提示检索器)、Atlas(四种监督微调嵌入模型的方法)

T2、适配器(引入援助以进行对齐)：比如PRCA(训练适配器)、RECOMP(生成摘要)、PKG(指令微调)

5 Generation生成

RAG的生成器：输入不仅包括典型的上下文信息，还包括通过检索器获得的相关文本片段→目的是深入了解问题的上下文

5.1 Post-retrieval with Frozen LLM使用冻结LLM进行后检索

痛点：LLM的上下文长度的限制和对冗余信息的敏感性

后检索处理中常见的两大操作

信息压缩：因需管理大量信息、LLM的上下文限制→提出信息压缩，降低噪声+解决上下文长度限制+增强生成效果，比如PRCA(训练一个信息提取器)、RECOMP(采用对比学习)、Filter-Reranker(减少文档的数量)等

5.2 Fine-tuning LLM for RAG为RAG进行LLM微调

优化生成器：确保生成的文本既自然又有效地利用检索到的文档

A1、一般优化过程：比如Self-Mem，或Joint-Encoder联合编码器、Dual-Encoder双编码器

A2、运用对比学习：痛点(只在单个正确的输出样本上进行训练)→采用对比学习，减轻暴露偏差→减少过拟合和增强模型的泛化能力,比如SURGE(使用图文对比学习)、SANTA(三部分的训练方案)

6 Augmentation in RAG增强

6.1 RAG in Augmentation Stages增强阶段：三个步骤中整合了各种技术

预训练阶段：比如REALM(采用结构化检索方法)、RETRO(从头开始进行大规模预训练)、COG(新的文本生成方法)、RETRO++等

增强预训练的优点：改进了文本生成质量、参数更少、适用于处理知识密集型任务、获得的基础模型性能强韧

增强预训练的挑战：大量资源、降低更新频率

微调阶段：可以对齐查询和文档之间的差异、满足进行风格化和定向调整、对齐检索器和生成器以改进模型协同

两者结合更强：RAG和Fine-tuning是增强LLM的强大工具

微调检索模型提升语义表示质量

微调生成模型实现针对性输出

联合微调检索器和生成器提升泛化能力：比如RA-DIT(双指令调优)

推理阶段：

初级RAG：在此阶段导入检索内容指导生成；

先进的技术引入了更多上下文丰富的信息，如利用交互文本或知识模块

考虑多步推理任务，采用迭代检索或链式推理

优缺点：轻量级的、经济有效的、无需再训练，但需要仔细的数据处理和优化

Figure 4: Taxonomy of RAG’s core components

6.2 Augmentation Source增强数据来源

扩充非结构化数据：采用语料库如文本等进行增强，提取单元从词到短语、段落不等，比如FLARE、RETRO

增强结构化数据：比如RET-LLMs(使用知识图谱提供高质量上下文)、SUGRE(多模态对比学习)、KnowledGPT(将知识存储在个性化库)

LLMs自身生成内容增强：利用LLMs内部知识避免外部信息限制，选择性应用检索增强，替换检索器使用LLMs生成更匹配预训练目标的上下文

Figure 5: Technology tree of representative RAG research with different augmentation aspects

6.3 Augmentation Process增强过程

迭代的检索：重复收集文档提供更丰富知识支持，可能出现语义塌陷和无关信息累积

递归检索：层层细化查询获得更深入信息，常用于复杂多步任务

自适应的检索：LLMs主动判断检索时机和内容，优化检索效率和相关性

6.4 RAG vs Fine-Tuning对比：两者可以互补

RAG更适合特定的查询，整合新知识、快速迭代新用例

Table 1: Comparison between RAG and Fine-Tuning

7 RAG Evaluation

7.1 Evaluation Targets评估目标

检索的质量：命中率、MRR等

生成质量：未标签的生成质量和未标签的答案准确率

Figure 6: RAG compared with other model optimization methods

7.2 Evaluation Aspects评估内容

质量分数：上下文相关性、答案逼真性、答案相关性

所需的能力：噪声鲁棒性、负面拒绝、信息整合、反事实鲁棒性

7.3 Evaluation Benchmarks and Tools

评估基准和工具：基准测试量化模型展示能力，自动评估工具计算质量得分

重要基准和工具：RGB、RECALL，RAGAS、ARES、TruLens

Table 2: Summary of metrics applicable for evaluation aspects of RAG

Table 3: Summary of evaluation frameworks

8 Future Prospects

8.1 Future Challenges of RAG未来挑战

上下文长度(信息不足/信息稀释)、鲁棒性(噪声/矛盾信息)等技术难点：有待提升

混合方法(RAG+FT)：混合RAG和微调的优化，新兴的策略

扩展LLMs角色

扩展定律

生产使用的RAG：提高检索效率/召回率、确保数据安全(如防止LLMs无意中泄露文档源或元数据)

RAG的多模态：模态扩展让RAG能适应更广内容，图像(如RA-CM3/BLIP-2等)、音频视频、代码等多模态整合

8.2 Ecosystem of RAG：生态系统需要优化评估和下游应用

下游任务：开放问答、事实检测等

评估框架：细化metric、解释性

技术堆栈：LangChain/LLamaIndex等，RAG工具包

Figure 7: Summary of RAG ecosystem

9 Conclusion

发展三阶段、微调和增强学习拓宽了其应用范围、多模态、生态系统发展

《Retrieval-Augmented Generation for Large Language Models: A Survey大型语言模型的检索增强生成研究综述》翻译与解读

地址

论文地址：https://arxiv.org/abs/2312.10997

时间

2024年1月5日

作者

Yunfan Gao 1, Yun Xiong 2, Xinyu Gao 2, Kangxiang Jia 2, Jinliu Pan 2, Yuxi Bi 3, Yi

Dai1, Jiawei Sun1, Qianyu Guo4, Meng Wang 3 and Haofen Wang 1,3 ∗

同济大学，复旦大学

Abstract

RAG诞生的背景痛点：幻觉、过时的知识和不透明、不可追溯的推理过程等挑战

论文综述：

RAG三范式，初级RAG→高级RAG→模块化RAG

RAG三基础，检索技术、生成技术、增强技术

Large Language Models (LLMs) demonstrate significant capabilities but face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the models, particu-larly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs’ intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval , the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces the metrics and benchmarks for assessing RAG models, along with the most up-to-date evaluation framework. In conclusion, the paper delineates prospective avenues for research, including the identification of challenges, the expansion of multi-modalities, and the progression of the RAG infrastructure and its ecosystem. 1.

大型语言模型(LLM)展示了显著的能力，但面临着诸如幻觉、过时的知识和不透明、不可追溯的推理过程等挑战。检索增强生成(Retrieval-Augmented Generation, RAG)通过整合来自外部数据库的知识，而成为一种很有前途的解决方案。这提高了模型的准确性和可信度，特别是对于知识密集型任务，并允许持续更新知识和集成领域特定信息。RAG将LLM的内在知识与外部数据库的庞大动态存储库协同合并。

本综述论文详细审查了RAG范例的发展，包括初级RAG、高级RAG和模块化RAG。详细分析了RAG框架的三方面基础，包括检索技术、生成技术和增强技术。本文强调了这些关键组件中嵌入的最先进的技术，提供了对RAG系统进步的深刻理解。此外，本文还介绍了用于评估RAG模型的指标和基准，以及最新的评估框架。最后，本文描述了未来的研究前景，包括挑战的识别、多模态性的扩展以及RAG基础设施及其生态系统的发展。

1 Introduction引言

LLM显著成功但存在明显局限性→RAG成为有希望的解决方案

特定领域或高度专业化查询→易产生不正确信息或幻觉→不适合超出训练数据范围或需最新信息场景→生产环境中部署的不切实际性

Large language models (LLMs) such as the GPT se-ries [Brown et al., 2020, OpenAI, 2023] and the LLama se-ries [Touvron et al., 2023], along with other models like Gemini [Google, 2023], have achieved remarkable suc-cess in natural language processing, demonstrating supe-rior performance on various benchmarks including Super-GLUE [Wang et al., 2019], MMLU [Hendrycks et al., 2020], and BIG-bench [Srivastava et al., 2022]. Despite these advancements, LLMs exhibit notable limitations, par-ticularly in handling domain-specific or highly special-ized queries [Kandpal et al., 2023]. A common issue is the generation of incorrect information, or ”hallucina-tions” [Zhang et al., 2023b], especially when queries extend beyond the model’s training data or necessitate up-to-date in-formation. These shortcomings underscore the impractical-ity of deploying LLMs as black-box solutions in real-world production environments without additional safeguards. One promising approach to mitigate these limitations is Retrieval-Augmented Generation (RAG), which integrates external data retrieval into the generative process, thereby enhancing the model’s ability to provide accurate and relevant responses.

大型语言模型(LLM)，如GPT系列[Brown等人，2020,OpenAI, 2023]和LLama系列[Touvron等人，2023]，以及Gemini [Google, 2023]等其他模型，在自然语言处理方面取得了显著的成功，在各种基准测试中表现出卓越的性能，包括Super-GLUE [Wang等人，2019]，MMLU [Hendrycks等人，2020]和BIG-bench [Srivastava等人，2022]。

尽管取得了这些进步，LLM仍然表现出明显的局限性，特别是在处理特定领域或高度专业化的查询方面[Kandpal等人，2023]。一个常见的问题是产生不正确的信息，或“幻觉”[Zhang等人，2023b]，特别是当查询超出模型的训练数据范围或需要最新信息时。这些缺陷凸显了在现实生产环境中部署LLMs作为黑匣子解决方案的不切实际性，需要额外的保障。缓解这些限制的一种有希望的方法是检索增强生成(retrieve - augmented Generation, RAG)，它将外部数据检索整合到生成过程中，从而增强模型提供准确和相关响应的能力。

RAG的诞生→目前已成为，改进聊天机器人的关键技术

RAG, introduced by Lewis et al. [Lewis et al., 2020] in mid-2020, stands as a paradigm within the realm of LLMs, enhancing generative tasks. Specifically, RAG involves an initial retrieval step where the LLMs query an external data source to obtain relevant information before proceeding to an-swer questions or generate text. This process not only informs the subsequent generation phase but also ensures that the re-sponses are grounded in retrieved evidence, thereby signif-icantly enhancing the accuracy and relevance of the output. The dynamic retrieval of information from knowledge bases during the inference phase allows RAG to address issues such as the generation of factually incorrect content, commonly referred to as “hallucinations.” The integration of RAG into LLMs has seen rapid adoption and has become a pivotal tech-nology in refining the capabilities of chatbots and rendering LLMs more viable for practical applications.

RAG由Lewis等人于2020年中提出[Lewis et al., 2020]，是LLMs领域内的一种范例，增强了生成任务。具体来说，RAG涉及一个初始检索步骤，LLM在此步骤中查询外部数据源以获取相关信息，然后再继续回答问题或生成文本。这个过程不仅通知了后续的生成阶段，而且还确保了响应是基于检索到的证据，从而显著提高了输出的准确性和相关性。在推理阶段从知识库中动态检索信息使RAG能够解决诸如生成事实不正确的内容(通常称为“幻觉”)之类的问题。将RAG集成到LLMs中已经得到了迅速的采用，并已成为改进聊天机器人功能和使LLMs在实际应用中更可行的关键技术。

RAG进化的四阶段：2017年的Transformer(侧重优化预训练)→RAG的休眠期→2022年的ChatGPT(侧重推理)→2023年的GPT4(侧重混合方法)

The evolutionary trajectory of RAG unfolds across four distinctive phases, as illustrated in Figure 1. In its in-ception in 2017, aligned with the emergence of the Trans-former architecture, the primary thrust was on assimilating additional knowledge through Pre-Training Models (PTM) to augment language models. This epoch witnessed RAG’s foundational efforts predominantly directed at optimizing pre-training methodologies.

Following this initial phase, a period of relative dormancy ensued before the advent of chatGPT, during which there was minimal advancement in related research for RAG. The sub-sequent arrival of chatGPT marked a pivotal moment in the trajectory, propelling LLMs into the forefront. The com-munity’s focal point shifted towards harnessing the capabil-ities of LLMs to attain heightened controllability and ad-dress evolving requirements. Consequently, the lion’s share of RAG endeavors concentrated on inference, with a minor-ity dedicated to fine-tuning processes. As LLM capabili-ties continued to advance, especially with the introduction of GPT-4, the landscape of RAG technology underwent a sig-nificant transformation. The emphasis evolved into a hybrid approach, combining the strengths of RAG and fine-tuning, alongside a dedicated minority continuing the focus on opti-mizing pre-training methodologies.

RAG的进化轨迹在四个不同的阶段展开，如图1所示。在2017年的初始阶段，与Transformer架构的出现相一致，其主要目的是通过预训练模型(PTM)吸收额外的知识，以增强语言模型。这个时代见证了，RAG的基础工作主要集中在优化预训练方法上。

在这个初始阶段之后，在ChatGPT出现之前，有一段相对的休眠期，在此期间，对RAG的相关研究进展甚微。随后，ChatGPT的到来标志着这一发展轨迹的关键时刻，将LLMs推向了前沿。社区的焦点转向利用LLMs的能力，以获得更高的可控性，并解决不断变化的需求。因此，RAG的大部分努力都集中在推理上，只有一小部分致力于微调过程。随着LLMs技术的不断发展，尤其是GPT-4的引入，RAG技术的格局发生了重大变化。重点发展成为一种混合方法，将RAG和微调的优势结合起来，同时一小部分专注于优化预训练方法。

本调查旨在概述整个RAG过程

Despite the rapid growth of RAG research, there has been a lack of systematic consolidation and abstraction in the field, which poses challenges in understanding the comprehensive landscape of RAG advancements. This survey aims to out-line the entire RAG process and encompass the current and future directions of RAG research, by providing a thorough examination of retrieval augmentation in LLMs.

Therefore, this paper aims to comprehensively summarize and organize the technical principles, developmental history, content, and, in particular, the relevant methods and applica-tions after the emergence of LLMs, as well as the evaluation methods and application scenarios of RAG. It seeks to provide a comprehensive overview and analysis of existing RAG technologies and offer conclusions and prospects for future development methods. This survey intends to furnish readers and practitioners with a thorough and systematic comprehen-sion of large models and RAG, elucidate the progression and key technologies of retrieval augmentation, clarify the merits and limitations of various technologies along with their suit-able contexts, and forecast potential future developments.

尽管RAG研究发展迅速，但该领域缺乏系统的整合和抽象，这对理解RAG进展的全面前景提出了挑战。本调查旨在概述整个RAG过程，并涵盖RAG研究的当前和未来方向，通过对LLMs中的检索增强进行彻底的审查。

因此，本文旨在对RAG的技术原理、发展历史、内容，特别是LLMs出现后的相关方法和应用，以及RAG的评估方法和应用场景进行全面的总结和整理。它试图对现有的RAG技术进行全面的概述和分析，并对未来的发展方法提出结论和展望。本调查旨在使读者和从业者对大型模型和检索增强有一个全面和系统的了解，阐明检索增强的进展和关键技术，阐明各种技术的优点和局限性以及它们的适用背景，并预测潜在的未来发展。

贡献：范式三大演变、三大核心技术、RAG评估框架、未来方向

Our contributions are as follows:

>>We present a thorough and systematic review of the state-of-the-art RAG, delineating its evolution through paradigms including naive RAG, advanced RAG, and modular RAG. This review contextualizes the broader scope of RAG research within the landscape of LLMs.

>>We identify and discuss the central technologies integral to the RAG process, specifically focusing on the aspects of “Retrieval”, “Generator” and “Augmentation”, and delve into their synergies, elucidating how these com-ponents intricately collaborate to form a cohesive and effective RAG framework.

>>We construct a thorough evaluation framework for RAG, outlining the evaluation objectives and metrics. Our comparative analysis clarifies the strengths and weak-nesses of RAG compared to fine-tuning from various perspectives. Additionally, we anticipate future direc-tions for RAG, emphasizing potential enhancements to tackle current challenges, expansions into multi-modal settings, and the development of its ecosystem.

我们的贡献如下:

>>我们对最先进的RAG进行了全面和系统的回顾，描述了其通过范式的演变，包括初级的RAG，先进的RAG和模块化的RAG。这篇综述的背景下，更广泛的范围内的LLMs研究RAG的景观。

>>我们确定并讨论了RAG过程中不可或缺的核心技术，特别关注“检索”，“生成”和“增强”方面，并深入研究了它们的协同作用，阐明了这些组件如何复杂地协作以形成一个有凝聚力和有效的RAG框架。

>>我们构建了一个全面的RAG评估框架，概述了评估目标和指标。我们的对比分析从多个角度阐明了RAG与微调相比的优缺点。此外，我们预测了RAG的未来方向，强调潜在的增强以应对当前的挑战，扩展到多模态设置，以及其生态系统的发展。

论文结构：定义及其发展过程、核心组件、评估体系、优化方法比较

The paper unfolds as follows: Section 2 and 3 define RAG and detail its developmental process. Section 4 through 6 ex-plore core components—Retrieval, “Generation” and “Aug-mentation”—highlighting diverse embedded technologies. Section 7 focuses on RAG’s evaluation system. Section 8 compare RAG with other LLM optimization methods and suggest potential directions for its evolution. The paper con-cludes in Section 9.

论文的结构如下：第二节和第三节对RAG进行了定义，并详细介绍了RAG的发展过程。第4节至第6节探讨了核心组件——检索、“生成”和“增强”——重点介绍了各种嵌入式技术。第7节重点介绍RAG的评估体系。第8节将RAG与其他LLM优化方法进行了比较，并提出了其可能的发展方向。本文在第9节结束。

Figure 1: Technology tree of RAG research development featuring representative works

2 Definition

通过样例来理解RAG工作流

The definition of RAG can be summarized from its workflow. Figure 2 depicts a typical RAG application workflow. In this scenario, a user inquires ChatGPT about a recent high-profile event (i.e., the abrupt dismissal and reinstatement of Ope-nAI’s CEO) which generated considerable public discourse. ChatGPT as the most renowned and widely utilized LLM, constrained by its pretraining data, lacks knowledge of re-cent events. RAG addresses this gap by retrieving up-to-date document excerpts from external knowledge bases. In this in-stance, it procures a selection of news articles pertinent to the inquiry. These articles, alongside the initial question, are then amalgamated into an enriched prompt that enables ChatGPT to synthesize an informed response. This example illustrates the RAG process, demonstrating its capability to enhance the model’s responses with real-time information retrieval.

RAG的定义可以从它的工作流程中总结出来。图2描述了一个典型的RAG应用程序工作流。在这个场景中，用户向ChatGPT询问最近一次备受关注的事件(例如，Ope-nAI的首席执行官突然被解雇和复职)，这引起了广泛的公共讨论。ChatGPT作为最著名和应用最广泛的LLM，受其预训练数据的限制，缺乏对近期事件的了解。RAG通过从外部知识库检索最新的文档摘要来解决这一差距。在这种情况下，它获得了与查询相关的一些新闻文章。然后，这些文章与最初的问题合并成一个丰富的提示，使ChatGPT能够合成一个知情的响应。这个例子说明了RAG过程，展示了它通过实时信息检索增强模型响应的能力。

RAG三内容涉及的创新方法：检索什么→何时检索→→如何使用“检索到的数据”

Technologically, RAG has been enriched through various innovative approaches addressing pivotal questions such as “what to retrieve” “when to retrieve” and “how to use the retrieved information”. For “what to retrieve” research has progressed from simple token [Khandelwal et al., 2019] and entity retrieval [Nishikawa et al., 2022] to more complex structures like chunks [Ram et al., 2023] and knowledge graph [Kang et al., 2023], with studies focusing on the granularity of retrieval and the level of data structur-ing. Coarse granularity brings more information but with lower precision. Retrieving structured text provides more information while sacrificing efficiency. The ques-tion of “when to retrieve” has led to strategies ranging from single [Wang et al., 2023e, Shi et al., 2023] to adap-tive [Jiang et al., 2023b, Huang et al., 2023] and multiple retrieval [Izacard et al., 2022] methods. High frequency of retrieval brings more information and lower efficiency. As for ”how to use” the retrieved data, integration techniques have been developed across various levels of the model architecture, including the input [Khattab et al., 2022], intermediate [Borgeaud et al., 2022], and output lay-ers [Liang et al., 2023]. Although the “intermediate” and “output layers” are more effective, there are problems with the need for training and low efficiency.

在技术上，RAG通过各种创新方法得到了丰富，这些方法解决了诸如“检索什么”、“何时检索”和“如何使用检索到的信息”等关键问题。

>> 对于“检索什么”的研究已经从简单的令牌[Khandelwal等人，2019]和实体检索[Nishikawa等人，2022]发展到更复杂的结构，如块[Ram等人，2023]和知识图谱[Kang等人，2023]，研究重点是检索的粒度和数据结构的水平。粗粒度带来更多的信息，但精度较低。检索结构化文本可以在牺牲效率的同时提供更多信息。

>> “何时检索”的问题导致了从单一[Wang等人，2023e, Shi等人，2023]到自适应[Jiang等人，2023b, Huang等人，2023]和多次检索[Izacard等人，2022]方法的策略。高频率的检索提供更多信息，但效率较低。

>> 至于“如何使用”检索到的数据，已经在模型架构的各个层次上开发了集成技术，包括输入层[Khattab等人，2022]、中间层[Borgeaud等人，2022]和输出层[Liang等人，2023]。虽然“中间层”和“输出层”更有效，但存在需要训练和效率低的问题。

RAG的工作范式(协同)、主要优势(无需再训练+高实用性+低入门门槛)→最受欢迎的架构之一、对话产品的底座

RAG is a paradigm that enhances LLMs by integrating ex-ternal knowledge bases. It employs a synergistic approach, combining information retrieval mechanisms and In-Context Learning (ICL) to bolster the LLM’s performance. In this framework, a query initiated by a user prompts the retrieval of pertinent information via search algorithms. This information is then woven into the LLM’s prompts, providing additional context for the generation process. RAG’s key advantage lies in its obviation of the need for retraining of LLMs for task-specific applications. Developers can instead append an ex-ternal knowledge repository, enriching the input and thereby refining the model’s output precision. RAG has become one of the most popular architectures in LLMs’ systems, due to its high practicality and low barrier to entry, with many con-versational products being built almost entirely on RAG.

RAG是一种通过集成外部知识库来增强LLMs的范例。它采用协同方法，将信息检索机制和上下文学习(ICL)相结合，以提高LLMs的表现。在这个框架中，用户发起的查询提示通过搜索算法检索相关信息。然后将这些信息编织到LLM的提示中，为生成过程提供额外的上下文。 RAG的主要优势在于无需为特定任务应用重新训练LLMs。开发人员可以附加一个外部知识库，丰富输入，从而改进模型的输出精度。由于其高实用性和低入门门槛，RAG已成为LLMs系统中最受欢迎的架构之一，许多对话产品几乎完全基于RAG构建。

工作流三步骤：语料库离散块+编码器模型构建向量索引→根据相似度执行检索块→集成上下文信息

The RAG workflow comprises three key steps. First, the corpus is partitioned into discrete chunks, upon which vec-tor indices are constructed utilizing an encoder model. Sec-ond, RAG identifies and retrieves chunks based on their vec-tor similarity to the query and indexed chunks. Finally, the model synthesizes a response conditioned on the contextual information gleaned from the retrieved chunks. These steps form the fundamental framework of the RAG process, under-pinning its information retrieval and context-aware genera-tion capabilities. Next, we will provide an introduction to the RAG research framework.

RAG工作流包括三个关键步骤。首先，将语料库划分为离散块，利用编码器模型在其上构建向量索引。其次，RAG根据它们与查询和索引块的向量相似性来识别和检索块。最后，该模型综合了基于从检索块中收集到的上下文信息的响应。这些步骤构成了RAG流程的基本框架，支持其信息检索和上下文感知生成功能。接下来，我们将介绍RAG研究框架。

3 RAG Framework

The RAG research paradigm is continuously evolving, and this section primarily delineates its progression. We cate-gorize it into three types: Naive RAG, Advanced RAG, and Modular RAG. While RAG were cost-effective and surpassed the performance of the native LLM, they also exhibited sev-eral limitations. The development of Advanced RAG and Modular RAG was a response to these specific shortcomings in Naive RAG.

RAG研究范式是不断发展的，本节主要描述了它的发展过程。我们将其分为三种类型：初级RAG、高级RAG和模块化RAG。虽然RAG具有成本效益，并且性能超过了原生LLM，但它们也有一些局限性。高级RAG和模块化RAG的开发是对初级RAG的这些具体缺点的回应。

Figure 3: Comparison between the three paradigms of RAG

3.1 Naive RAG初级RAG

The Naive RAG research paradigm represents the earliest methodology, which gained prominence shortly after the widespread adoption of ChatGPT. The Naive RAG follows a traditional process that includes indexing, retrieval, and gen-eration. It is also characterized as a “Retrieve-Read” frame-work [Ma et al., 2023a].

初级RAG研究范式代表了最早的方法论，它在ChatGPT被广泛采用后不久就获得了突出的地位。初级RAG遵循一个传统的过程，包括索引、检索和生成。它也被描述为“检索-读取”框架[Ma et al.， 2023a]。

(1)、索引：数据统一为标准文本→分割成块→嵌入模型向量化→创建索引(键值对的形式存储)

Indexing

The indexing process is a crucial initial step in data prepara-tion that occurs offline and involves several stages. It begins with data indexing, where original data is cleansed and ex-tracted, and various file formats such as PDF, HTML, Word, and Markdown are converted into standardized plain text. In order to fit within the context limitations of language models, this text is then segmented into smaller, more manageable chunks in a process known as chunking. These chunks are subsequently transformed into vector representations through an embedding model, chosen for its balance between infer-ence efficiency and model size. This facilitates similarity comparisons during the retrieval phase. Finally, an index is created to store these text chunks and their vector embed-dings as key-value pairs, which allows for efficient and scal-able search capabilities.

索引

索引过程是离线数据准备的关键初始步骤，涉及几个阶段。它从数据索引开始，清理和提取原始数据，并将各种文件格式(如PDF、HTML、Word和Markdown)转换为标准化的纯文本。为了适应语言模型的上下文限制，该文本然后被分割成更小、更易于管理的块，这个过程称为分块。这些块随后通过嵌入模型转换为向量表示，选择嵌入模型是为了在推理效率和模型大小之间取得平衡。这有助于在检索阶段进行相似性比较。最后，创建索引以键值对的形式存储这些文本块及其向量嵌入，从而实现高效且可扩展的搜索功能。

(2)、检索：接受用户查询→编码为向量表示→计算相似度并取前k个→作为上下文基础

Retrieval

Upon receipt of a user query, the system employs the same en-coding model utilized during the indexing phase to transcode the input into a vector representation. It then proceeds to compute the similarity scores between the query vector and the vectorized chunks within the indexed corpus. The system prioritizes and retrieves the top K chunks that demonstrate the greatest similarity to the query. These chunks are subse-quently used as the expanded contextual basis for addressing the user’s request.

检索

在接收到用户查询后，系统使用索引阶段使用的相同编码模型将输入转码为矢量表示。然后，它继续计算查询向量和索引语料库中矢量化块之间的相似性分数。系统对与查询最相似的前K个块进行优先级排序并检索。这些块随后被用作扩展的上下文基础，用于处理用户的请求。

Figure 2: A representative instance of the RAG process applied to question answering

(3)、生成：合成为连贯的提示→大模型生成响应→可多轮对话交互

Generation

The posed query and selected documents are synthesized into a coherent prompt to which a large language model is tasked with formulating a response. The model’s approach to an-swering may vary depending on task-specific criteria, allow-ing it to either draw upon its inherent parametric knowledge or restrict its responses to the information contained within the provided documents. In cases of ongoing dialogues, any existing conversational history can be integrated into the prompt, enabling the model to engage in multi-turn dialogue interactions effectively.

生成

提出的查询和选定的文档被合成为一个连贯的提示，大型语言模型的任务是制定响应。模型的回答方法可能会根据特定于任务的标准而有所不同，允许它利用其固有的参数知识或限制其对所提供文档中包含的信息的响应。在进行持续对话的情况下，现有的对话历史可以集成到提示中，使模型能够有效地参与多轮对话交互。

(4)、Naive RAG的缺点

检索质量(低精度+低召回率)、生成质量(幻觉/不相关+毒性/偏见)、增强过程(分散/不连贯+冗余/重复，需有效整合)

Drawbacks in Naive RAG Naive RAG faces significant challenges in three key areas: “Retrieval,” “Generation,” and “Augmentation”.	Naive RAG在三个关键领域面临重大挑战:“检索”、“生成”和“增强”。
Retrieval quality poses diverse challenges, including low precision, leading to misaligned retrieved chunks and po-tential issues like hallucination or mid-air drop. Low recall also occurs, resulting in the failure to retrieve all relevant chunks, thereby hindering the LLMs’ ability to craft comprehensive responses. Outdated information further compounds the problem, potentially yielding inaccurate retrieval results.	检索质量：检索质量面临各种挑战，包括低精度，导致检索到的块不准确对齐，可能出现幻觉或悬空问题。低回溯还会导致未能检索到所有相关块，从而阻碍LLMs制作全面响应的能力。过时的信息进一步加剧了问题，可能导致检索结果不准确。
Response generation quality presents hallucination chal-lenge, where the model generates answers not grounded in the provided context, as well as issues of irrelevant context and potential toxicity or bias in the model’s output.	生成质量：生成质量面临幻觉挑战，即模型生成的答案不基于所提供的上下文，以及模型输出中不相关的上下文和潜在的毒性或偏见问题。
The augmentation process presents its own challenges in effectively integrating context from retrieved passages with the current generation task, potentially leading to disjointed or incoherent output. Redundancy and repetition are also concerns, especially when multiple retrieved passages con-tain similar information, resulting in repetitive content in the generated response.	增强过程：增强过程在有效地整合来自检索段的上下文到当前生成任务中方面面临挑战，可能导致生成的输出分散或不连贯。冗余和重复也是问题，特别是当多个检索到的段落包含相似的信息时，会导致生成的响应中出现重复的内容。

多个检索段的重要性和相关性的辨别：重要性、相关性、一致性

Discerning the importance and relevance of multiple re-trieved passages to the generation task is another challenge, requiring the proper balance of each passage’s value. Addi-tionally, reconciling differences in writing styles and tones to ensure consistency in the output is crucial.

多个检索段的重要性和相关性的辨别：判断多个检索段对生成任务的重要性和相关性是另一个挑战，需要平衡每个段落的价值。此外，协调不同的写作风格和语调，以确保输出的一致性也至关重要。

过度依赖于增强信息：仅重复检索到的内容+没有新价值

Lastly, there’s a risk of generation models overly depend-ing on augmented information, potentially resulting in out-puts that merely reiterate the retrieved content without pro-viding new value or synthesized information.

过度依赖于增强信息：还存在生成模型过度依赖于增强信息的风险，可能导致输出仅仅重复检索到的内容，而没有提供新的价值或综合信息。

3.2 Advanced RAG高级RAG：针对性的增强

Advanced RAG has been developed with targeted enhance-ments to address the shortcomings of Naive RAG. In terms of retrieval quality, Advanced RAG implements pre-retrieval and post-retrieval strategies. To address the indexing chal-lenges experienced by Naive RAG, Advanced RAG has re-fined its indexing approach using techniques such as slid-ing window, fine-grained segmentation, and metadata. It has also introduced various methods to optimize the retrieval pro-cess [ILIN, 2023].

高级RAG通过有针对性的增强来解决初级RAG的缺陷。

>> 在检索质量方面，高级RAG采用了预检索和后检索策略。

>> 为了解决初级RAG所经历的索引挑战，高级RAG通过使用滑动窗口、精细分割和元数据等技术来完善其索引方法。它还引入了各种方法来优化检索过程[ILIN, 2023]。

(1)、预检索过程：优化数据索引，提高索引内容的质量，包含以下五大策略

Pre-Retrieval Process

Optimizing Data Indexing.The goal of optimizing data index-ing is to enhance the quality of the content being indexed. This involves five primary strategies: enhancing data gran-ularity, optimizing index structures, adding metadata, align-ment optimization, and mixed retrieval.

Pre-Retrieval过程

优化数据索引：优化数据索引的目标是提高索引内容的质量。这涉及五个主要策略：增强数据的粒度、优化索引结构、添加元数据、对齐优化和混合检索。

增强数据粒度：比如删除无关信息、消除实体和术语的模糊性、确认事实准确性、保持上下文和更新过时的文档

Enhancing data granularity aims to elevate text standard-ization, consistency, factual accuracy, and rich context to im-prove the RAG system’s performance. This includes remov-ing irrelevant information, dispelling ambiguity in entities and terms, confirming factual accuracy, maintaining context, and updating outdated documents.

增强数据的粒度：旨在提高文本标准化、一致性、事实准确性和丰富上下文，以提高RAG系统的性能。这包括删除无关信息、消除实体和术语的模糊性、确认事实准确性、保持上下文和更新过时的文档。

优化索引结构：比如调整块的大小、跨多个索引路径、图数据索引

Optimizing index structures involves adjusting the size of chunks to capture relevant context, querying across multiple index paths, and incorporating information from the graph structure to capture relevant context by leveraging relation-ships between nodes in a graph data index.

优化索引结构：包括调整块的大小以捕获相关上下文、跨多个索引路径进行查询以捕获相关上下文，以及利用图数据索引中节点之间的关系来合并图结构中的信息以捕获相关上下文。

添加元数据：用来过滤，比如日期和目的、引用的章节和小节等元数据

Adding metadata information involves integrating refer-enced metadata, such as dates and purposes, into chunks for filtering purposes, and incorporating metadata like chapters and subsections of references to improve retrieval efficiency.

添加元数据信息：包括将引用的元数据（如日期和目的）集成到块中以进行过滤，以及将引用的章节和小节等元数据集成到块中以提高检索效率。

对齐优化：比如引入“假设性问题”等方式

Alignment optimization addresses alignment issues and disparities between documents by introducing “hypothetical questions” [Li et al., 2023d] into documents to rectify align-ment issues and differences.

对齐优化：解决文档之间的对齐问题，通过引入“假设性问题”等方式解决/纠正文档之间的对齐问题和差异。

混合检索：

(2)、检索阶段：计算查询和块之间的相似性

Retrieval

During the retrieval stage, the primary focus is on identifying the appropriate context by calculating the similarity between the query and chunks. The embedding model is central to this process. In the advanced RAG, there is potential for op-timization of the embedding models.

检索

在检索阶段，主要关注的是通过计算查询和块之间的相似性来识别适当的上下文。嵌入模型是这个过程的核心。在高级RAG中，嵌入模型有潜力进行优化。

微调嵌入：处理不断演变或罕见术语，如BAAI开发的BGE-large-EN

Fine-tuning Embedding. Fine-tuning embedding models significantly impact the relevance of retrieved content in RAG systems. This process involves customizing embedding mod-els to enhance retrieval relevance in domain-specific contexts, especially for professional domains dealing with evolving or rare terms. The BGE embedding model [BAAI, 2023], such as BGE-large-EN developed by BAAI2, is an example of a high-performance embedding model that can be fine-tuned to optimize retrieval relevance. Training data for fine-tuning can be generated using language models like GPT-3.5-turbo to formulate questions grounded on document chunks, which are then used as fine-tuning pairs.

微调嵌入：微调嵌入模型显著影响RAG系统中检索内容的相关性。这个过程涉及定制嵌入模型，以提高在领域特定背景中的检索相关性，特别是对于处理不断演变或罕见术语的专业领域。例如，BAAI开发的BGE嵌入模型是一个高性能的嵌入模型—BGE-large-EN，可以进行微调以优化检索相关性。可以使用GPT-3.5-turbo等语言模型生成用于微调的训练数据，以制定基于文档块的问题，然后将其用作微调对。

动态嵌入：根据周围单词的不同而具有不同的嵌入，比如BERT、OpenAI的embeddings-ada-02、GPT-4

Dynamic Embedding adapts to the context in which words are used, unlike static embedding, which uses a single vec-tor for each word [Karpukhin et al., 2020]. For example, in transformer models like BERT, the same word can have varied embeddings depending on surrounding words. Ope-nAI’s embeddings-ada-02 model3, built upon the principles of LLMs like GPT, is a sophisticated dynamic embedding model that captures contextual understanding. However, it may not exhibit the same sensitivity to context as the latest full-size language models like GPT-4.

动态嵌入：静态嵌入为每个单词使用单个向量[Karpukhin等人，2020]。与静态嵌入不同，动态嵌入适应单词在使用上下文时的情况，例如，在BERT等Transformer模型中，同一个单词根据周围单词的不同而具有不同的嵌入。OpenAI的embeddings-ada-02模型是建立在GPT等LLMs原理基础上的先进动态嵌入模型，可以捕获上下文理解。然而，它可能不会像最新的全尺寸语言模型(如GPT-4)那样对上下文表现出同样的敏感性。

(3)、后检索过程：由于LLM的输入存在上下文窗口限制等，需额外处理检索到的相关文件

Post-Retrieval Process

After retrieving valuable context from the database, it is es-sential to merge it with the query as an input into LLMs while addressing challenges posed by context window limits. Sim-ply presenting all relevant documents to the LLM at once may exceed the context window limit, introduce noise, and hinder the focus on crucial information. Additional processing of the retrieved content is necessary to address these issues.

Post-Retrieval过程

在从数据库中检索到有价值的上下文之后，必须将其与查询合并，作为LLM的输入，同时解决上下文窗口限制带来的挑战。简单地将所有相关文件一次性呈现给LLMs可能会超出上下文窗口限制，引入噪音，并阻碍对关键信息的关注。为了解决这些问题，需要对检索到的内容进行额外处理。

重新排序：比如LlamaIndex、LangChai和HayStack等框架，比如Diversity Ranker(根据文档多样性)、LostInTheMiddleRanker(交替策略)、cohereAI rerank/big -rerank/LongLLMLingua(重计算语义相似度)

Re-Ranking. Re-ranking the retrieved information to re-locate the most relevant content to the edges of the prompt is a key strategy. This concept has been implemented in frameworks such as LlamaIndex4, LangChain5, and HayStack [Blagojevi, 2023]. For example, Diversity Ranker6 prioritizes reordering based on document diversity, while LostInTheMiddleRanker alternates placing the best docu-ment at the beginning and end of the context window. Ad-ditionally, approaches like cohereAI rerank [Cohere, 2023], bge-rerank7, and LongLLMLingua [Jiang et al., 2023a] re-calculate the semantic similarity between relevant text and the query, addressing the challenge of interpreting vector-based simulated searches for semantic similarity.

重新排序。重新对检索到的信息进行重新排序，将最相关的内容重新定位到提示的边缘是一个关键策略。这个概念已经在LlamaIndex、LangChai和HayStack等框架中实现[Blagojevi, 2023]。例如，Diversity Ranker根据文档多样性对重新排序进行优先级排序，而LostInTheMiddleRanker则交替将最佳文档放在上下文窗口的开头和结尾。此外，cohereAI rerank [Cohere, 2023]、big -rerank和LongLLMLingua [Jiang等人，2023]等方法重新计算了相关文本与查询之间的语义相似度，解决了解释基于向量的模拟搜索语义相似度的挑战。

提示压缩：压缩不相关/突出关键信息/减少整体长度，比如Selective Context/LLMLingua(用小语言模型来计算提示互信息或困惑)、Recomp (不同粒度上训练压缩器)、Long Context等

Prompt Compression. Research indicates that noise in re-trieved documents adversely affects RAG performance. In post-processing, the emphasis lies in compressing irrelevant context, highlighting pivotal paragraphs, and reducing the overall context length. Approaches such as Selective Context and LLMLingua [Litman et al., 2020, Anderson et al., 2022] utilize small language models to calculate prompt mu-tual information or perplexity, estimating element impor-tance. Recomp [Xu et al., 2023a] addresses this by train-ing compressors at different granularities, while Long Context [Xu et al., 2023b] and “Walking in the Memory Maze” [Chen et al., 2023a] design summarization techniques to enhance LLM’s key information perception, particularly in dealing with extensive contexts.

提示压缩。研究表明，检索文档中的噪声会对RAG性能产生不利影响。在后处理中，重点在于压缩不相关的上下文，突出关键段落，减少整体上下文长度。选择性语境(Selective Context)和LLMLingua等方法[Litman et al.， 2020, Anderson et al.， 2022]利用小语言模型来计算提示互信息或困惑，从而估计元素的重要性。Recomp [Xu等人，2023a]通过在不同粒度上训练压缩器来解决这个问题，而Long Context [Xu等人，2023b]和“在记忆迷宫中行走”[Chen等人，2023a]设计了总结技术来增强LLM的关键信息感知，特别是在处理广泛的上下文时。

3.3 Modular RAG模块化RAG：更大的通用性和灵活性

The modular RAG structure diverges from the tradi-tional Naive RAG framework, providing greater versatil-ity and flexibility. It integrates various methods to en-hance functional modules, such as incorporating a search module for similarity retrieval and applying a fine-tuning approach in the retriever [Lin et al., 2023]. Restructured RAG modules [Yu et al., 2022] and iterative methodologies like [Shao et al., 2023] have been developed to address spe-cific issues. The modular RAG paradigm is increasingly be-coming the norm in the RAG domain, allowing for either a serialized pipeline or an end-to-end training approach across multiple modules. The comparison of three RAG paradigms is depicted in Figure 3. However, Modular RAG is not stan-dalone. Advanced RAG is a specialized form of modular RAG, and further, Naive RAG itself is a special case of Ad-vanced RAG. The relationship among the three paradigms is one of inheritance and development.

模块化的RAG结构不同于传统的Naive RAG框架，提供了更大的通用性和灵活性。它集成了各种方法来增强功能模块，例如在检索器中加入相似检索的搜索模块和应用微调方法[Lin et al.， 2023]。重构RAG模块[Yu et al.， 2022]和迭代方法(如[Shao et al.， 2023])已被开发用于解决特定问题。

模块化的RAG范式越来越成为RAG领域的规范，它允许序列化的管道或跨多个模块的端到端训练方法。图3描述了三种RAG范式的比较。然而，模块化RAG并不是独立的。高级RAG是模块化RAG的一种特殊形式，此外，初级RAG本身是高级RAG的一种特殊情况。三种范式之间是一种继承与发展的关系。

(1)、新模块：各种新模块

搜索模块：针对特定场景的定制，

New Modules

Search Module. In contrast to the similarity retrieval in Naive/Advanced RAG, the Search Module is tailored to spe-cific scenarios and incorporates direct searches on additional corpora. This integration is achieved using code generated by the LLM, query languages such as SQL or Cypher, and other custom tools. The data sources for these searches can include search engines, text data, tabular data, and knowledge graphs [Wang et al., 2023d].

新模块

搜索模块。与Naive/Advanced RAG中的相似度检索相比，搜索模块针对特定场景进行了定制，并集成了对其他语料库的直接搜索。这种集成是使用LLM生成的代码、查询语言(如SQL或Cypher)以及其他自定义工具来实现的。这些搜索的数据源可以包括搜索引擎、文本数据、表格数据和知识图谱[Wang et al.， 2023]。

记忆模块：利用LLM的记忆能力来指导检索，比如Selfmem(使用自己的输出来改进自己)

Memory Module. This module harnesses the memory ca-pabilities of the LLM to guide retrieval. The approach in-volves identifying memories most similar to the current input. Selfmem [Cheng et al., 2023b] utilizes a retrieval-enhanced generator to create an unbounded memory pool iteratively, combining the “original question” and “dual question”. By employing a retrieval-enhanced generative model that uses its own outputs to improve itself, the text becomes more aligned with the data distribution during the reasoning process. Con-sequently, the model’s own outputs are utilized instead of the training data [Wang et al., 2022a].

记忆模块。该模块利用LLM的记忆能力来指导检索。这种方法包括识别与当前输入最相似的记忆。Selfmem [Cheng et al.， 2023b]利用检索增强生成器迭代创建无限内存池，将“原始问题”和“双重问题”结合起来。通过使用检索增强的生成模型，该模型使用自己的输出来改进自己，文本在推理过程中更符合数据分布。因此，使用模型自身的输出来代替训练数据[Wang et al.， 2022a]。

融合：对原始查询(显式意图)和扩展查询(隐式意图)进行并行向量搜索，比如RAG-Fusion(多查询方法)

Fusion. RAG-Fusion [Raudaschl, 2023]enhances tradi-tional search systems by addressing their limitations through a multi-query approach that expands user queries into multiple, diverse perspectives using an LLM. This approach not only captures the explicit information users seek but also un-covers deeper, transformative knowledge. The fusion pro-cess involves parallel vector searches of both original and expanded queries, intelligent re-ranking to optimize results, and pairing the best outcomes with new queries. This sophis-ticated method ensures search results that align closely with both the explicit and implicit intentions of the user, leading to more insightful and relevant information discovery.

融合。RAG-Fusion [Raudaschl, 2023]通过多查询方法增强传统搜索系统，使用LLM将用户查询扩展到多个不同的视角，来解决传统搜索系统的局限性。这种方法不仅捕捉用户寻找的显式信息，还揭示更深层次、变革性的知识。

融合过程包括对原始查询和扩展查询进行并行向量搜索，智能重新排序以优化结果，并将最佳结果与新查询配对。这种复杂的方法确保搜索结果与用户的显式和隐式意图紧密一致，从而带来更有洞察力和相关的信息发现。

路由：多样化数据源→选择适当的数据存储存储→查询路由的决策源自预定义→LLM调用执行

Routing. The RAG system’s retrieval process utilizes di-verse sources, differing in domain, language, and format, which can be either alternated or merged based on the sit-uation [Li et al., 2023b]. Query routing decides the subse-quent action to a user’s query, with options ranging from summarization, searching specific databases, or merging dif-ferent pathways into a single response. The query router also chooses the appropriate data store for the query, which may include various sources like vector stores, graph databases, or relational databases, or a hierarchy of indices—for instance, a summary index and a document block vector index for multi-document storage. The query router’s decision-making is pre-defined and executed via LLMs calls, which direct the query to the chosen index.

路由。RAG系统的检索过程利用不同领域、语言和格式的多样化源，这些源可以根据情况进行交替或合并 [Li等，2023b]。查询路由决定用户查询的后续操作，其选项包括汇总、搜索特定数据库或将不同的路径合并到单个响应中。

查询路由器还为查询选择适当的数据存储，其中可能包括各种来源，如向量存储、图数据库或关系数据库，或者层次索引——例如，用于多文档存储的摘要索引和文档块向量索引。查询路由器的决策是预先定义的，并通过LLM调用执行，LLM调用将查询定向到所选的索引。

预测：利用LLM生成必要的上下文

Predict . It addresses the common issues of redundancy and noise in retrieved content. Instead of directly retrieving from a data source, this module utilizes the LLM to generate the necessary context [Yu et al., 2022]. The content produced by the LLM is more likely to contain pertinent information compared to that obtained through direct retrieval.

预测。它解决了检索内容中的冗余和噪声等常见问题。该模块不是直接从数据源中检索，而是利用LLM生成必要的上下文[Yu et al.， 2022]。与通过直接检索获得的内容相比，LLMs产生的内容更有可能包含相关信息。

任务适配器：侧重于使RAG适应各种下游任务，比如UPRISE(检索零样本任务输入的提示)、PROMPTAGA-TOR(利用LLM作为少样本查询生成器)

Task Adapter. This module focuses on adapting RAG to a variety of downstream tasks. UPRISE automates the retrieval of prompts for zero-shot task inputs from a pre-constructed data pool, thereby enhancing universality across tasks and models [Cheng et al., 2023a]. Meanwhile, PROMPTAGA-TOR [Dai et al., 2022] utilizes LLM as a few-shot query gen-erator and, based on the generated data, creates task-specific retrievers. By leveraging the generalization capability of LLMs, it enables the development of task-specific end-to-end retrievers with minimal examples.

任务适配器。本模块侧重于使RAG适应各种下游任务。UPRISE自动从预构建的数据池中检索零样本任务输入的提示，从而增强了任务和模型之间的通用性[Cheng等人，2023a]。同时，PROMPTAGA-TOR [Dai et al.， 2022]利用LLM作为少样本查询生成器，并基于生成的数据创建特定于任务的检索器。通过利用LLM的泛化能力，它可以用最少的示例开发特定于任务的端到端检索器。

(2)、新模式，两种组织范式

New Patterns

The organizational structure of Modular RAG is highly adapt-able, allowing for the substitution or rearrangement of mod-ules within the RAG process to suit specific problem contexts.

新模式

模块化RAG的组织结构具有高度的适应性，允许在RAG过程中替换或重新排列模块以适应特定的问题上下文。

初级RAG和高级RAG都可以认为是由一些固定的模块组成

Naive RAG and Advanced RAG can both be considered as being composed of some fixed modules. As illustrated in the figure 3, Naive RAG primarily consists of the “Retrieve” and “Read” modules. A typical pattern of Advanced RAG builds upon the foundation of Naive RAG by adding “Rewrite” and “Rerank” modules. However, on the whole, modular RAG enjoys greater diversity and flexibility.	初级RAG和高级RAG都可以认为是由一些固定的模块组成的。如图3所示，Naive RAG主要由“Retrieve”和“Read”模块组成。高级RAG的典型模式建立在初级RAG的基础上，通过添加“重写”和“重新排序”模块。但总体而言，模块化RAG具有更大的多样性和灵活性。
Current research primarily explores two organizational paradigms. The first involves adding or replacing modules, while the second focuses on adjusting the organizational flow between modules. This flexibility enables tailoring the RAG process to effectively address a wide array of tasks.	目前的研究主要探讨了两种组织范式。前者涉及添加或替换模块，而后者侧重于调整模块之间的组织流程。这种灵活性使RAG过程能够有效地处理各种任务。

1)、添加或替换模块：保持核心结构，同时集成附加模块，比如RRR(引入了重写-检索-阅读过程)、Generate-Read(LLM的生成模块取代了检索模块)、Recite-Read(将外部检索转化为从模型权重中检索)

Adding or Replacing Modules.The strategy of introducing or substituting modules involves maintaining the core struc-ture of the Retrieval-Read process while integrating addi-tional modules to enhance specific functionalities. The RRR model [Ma et al., 2023a] introduces the Rewrite-Retrieve-Read process, utilizing the LLM performance as a reinforce-ment learning incentive for a rewriting module. This enables the rewriter to fine-tune retrieval queries, thereby improving the downstream task performance of the reader.

Similarly, modules can be selectively swapped in method-ologies like Generate-Read [Yu et al., 2022], where the LLM’s generation module takes the place of the retrieval module. The Recite-Read approach [Sun et al., 2022] trans-forms external retrieval into retrieval from model weights, requiring the LLM to initially memorize task-specific infor-mation and subsequently produce output capable of handling knowledge-intensive natural language processing tasks.

添加或替换模块。引入或替换模块的策略包括保持检索-读取过程的核心结构，同时集成其他附加模块以增强特定功能。RRR模型[Ma et al.， 2023a]引入了重写-检索-阅读过程，利用LLM性能作为重写模块的强化学习激励。这使重写器能够微调检索查询，从而提高读取器的下游任务性能。

类似地，可以在Generate-Read [Yu等，2022]等方法中有选择地交换模块，其中LLM的生成模块取代了检索模块。Recite-Read方法 [Sun等，2022]将外部检索转化为从模型权重中检索，要求LLM首先记住特定于任务的信息，然后产生能够处理知识密集型自然语言处理任务的输出。

2)、调整模块间的流程：侧重加强语言模型和检索模型之间的交互，比如DSP(将上下文学习系统视为一个明确的程序)、ITER-RETGEN(使用一个模块的输出来改进另一个模块)

Adjusting the Flow between Modules. zheIn the realm of module flow adjustment, there is a focus on enhancing the interaction between language models and retrieval mod-els. DSP [Khattab et al., 2022] introduces the Demonstrate-Search-Predict framework, treating the context learning sys-tem as an explicit program rather than a final task prompt, leading to more effective handling of knowledge-intensive tasks. The ITER-RETGEN [Shao et al., 2023] approach uti-lizes generated content to guide retrieval, iteratively im-plementing “retrieval-enhanced generation” and “generation-enhanced retrieval” within a Retrieve-Read-Retrieve-Read flow. This method demonstrates an innovative way of using one module’s output to improve the functionality of another.

调整模块间的流程。在模块流调整领域，重点是加强语言模型和检索模型之间的交互。DSP [Khattab等人，2022]引入了演示-搜索-预测框架，将上下文学习系统视为一个明确的程序，而不是最终的任务提示，从而更有效地处理知识密集型任务。ITER-RETGEN [Shao等人，2023]方法利用生成的内容来指导检索，在检索-阅读-检索-阅读流程中迭代实现“检索增强生成”和“生成增强检索”。这种方法展示了一种使用一个模块的输出来改进另一个模块的功能的创新方法。

(3)、RAG管道优化：提供检索的效率和质量，五大策略

Optimizing the RAG Pipeline

The optimization of the retrieval process aims to enhance the efficiency and quality of information in RAG systems. Cur-rent research focuses on integrating diverse search technolo-gies, refining retrieval steps, incorporating cognitive back-tracking, implementing versatile query strategies, and lever-aging embedding similarity. These efforts collectively strive to achieve a balance between retrieval efficiency and the depth of contextual information in RAG systems.

RAG管道优化

检索过程优化的目的是提高检索效率和检索质量。目前的研究主要集中在整合多种搜索技术、优化检索步骤、引入认知回溯、实现多功能查询策略以及利用老化嵌入相似度等方面。这些努力共同致力于在RAG系统中实现检索效率和上下文信息深度之间的平衡。

混合搜索探索：常用的有基于关键字的搜索、语义搜索和向量搜索

Hybrid Search Exploration. The RAG system optimizes its performance by intelligently integrating various techniques, including keyword-based search, semantic search, and vec-tor search. This approach leverages the unique strengths of each method to accommodate diverse query types and infor-mation needs, ensuring consistent retrieval of highly relevant and context-rich information. The use of hybrid search serves as a robust supplement to retrieval strategies, thereby enhanc-ing the overall efficacy of the RAG pipeline.

混合搜索探索。RAG系统通过智能集成各种技术来优化其性能，包括基于关键字的搜索、语义搜索和向量搜索。这种方法利用每种方法的独特优势来适应不同的查询类型和信息需求，确保一致地检索高度相关和上下文丰富的信息。混合搜索的使用作为检索策略的强大补充，从而增强了RAG管道的整体功效。

递归检索和查询引擎：两步检索方法，先小块(捕获语义)再大块(更多上下文)

Recursive Retrieval and Query Engine. Recursive retrieval involves acquiring smaller chunks during the initial retrieval phase to capture key semantic meanings. Subsequently, larger chunks containing more contextual information are provided to the LLM in later stages of the process. This two-step re-trieval method helps to strike a balance between efficiency and the delivery of contextually rich responses.

递归检索和查询引擎。递归检索涉及在初始检索阶段获取较小的块以捕获关键语义。随后，在流程的后期阶段，将向LLMs提供包含更多上下文信息的大块。这种两步检索方法有助于在效率和提供上下文丰富的响应之间取得平衡。

退步提示法：比如使用反向提示

StepBack-prompt approach encourages the LLM to move away from specific instances and engage in reasoning around broader concepts and principles [Zheng et al., 2023]. Experi-mental results demonstrate a significant performance increase in various challenging, inference-based tasks when backward prompts are used, highlighting their natural adaptability to the RAG process. These retrieval-enhancing steps can be applied both in generating responses to backward prompts and in the final question-answering process.

退步提示方法鼓励LLMs从具体实例中转移出来，围绕更广泛的概念和原则进行推理[Zheng等人，2023]。实验结果表明，当使用反向提示时，在各种具有挑战性的、基于推理的任务中，性能显著提高，突出了它们对RAG过程的自然适应性。这些增强检索的步骤既可以应用于生成对反向提示的响应，也可以应用于最终的问答过程。

子查询：比如LlamaIndex，比如树查询、向量查询、顺序查询

Sub-Queries. Depending on the scenario, various query strategies can be employed, such as using query engines provided by frameworks like LlamaIndex, leveraging tree queries, utilizing vector queries, or executing simple sequen-tial querying of chunks.

子查询。根据场景的不同，可以采用各种查询策略，例如使用LlamaIndex等框架提供的查询引擎、利用树查询、利用向量查询或执行简单的块顺序查询。

基于假设的文档嵌入：比如HyDE(生成的答案在嵌入空间中可能更接近于直接查询)，效果比较差

Hypothetical Document Embeddings. HyDE operates on the belief that the answers generated might be closer in the embedding space than a direct query. Using the LLM, HyDE creates a hypothetical document (answer) in response to a query, embeds this document, and uses the resulting em-bedding to retrieve real documents similar to the hypotheti-cal one. Instead of seeking embedding similarity based on the query, this approach focuses on the embedding similar-ity from one answer to another [Gao et al., 2022]. However, it might not consistently produce desirable outcomes, espe-cially when the language model is unfamiliar with the subject matter, potentially leading to more instances with errors.

假设的文档嵌入。HyDE的操作基于这样一种信念，即生成的答案在嵌入空间中可能更接近于直接查询。使用LLM，HyDE创建一个对查询的虚构文档（答案），嵌入这个文档，并使用生成的嵌入来检索与假设文档相似的实际文档。

与基于查询的嵌入相似性不同，这种方法侧重于从一个答案到另一个答案的嵌入相似性 [Gao等，2022]。然而，它可能不会始终产生理想的结果，特别是当语言模型不熟悉主题时，可能会导致更多带有错误的实例。

4 Retrieval检索

In the context of RAG, it is crucial to efficiently retrieve rel-evant documents from the data source. However, creating a proficient retriever presents significant challenges. This sec-tionelves into three fundamental questions: 1) How can we achieve accurate semantic representations? 2) What methods can align the semantic spaces of queries and documents? 3) How can the retriever’s output be aligned with the preferences of the Large Language Model?

在RAG上下文中，从数据源中高效检索相关文档是至关重要的。

然而，创建一个高效的检索器面临着重大的挑战。这一部分涉及三个基本问题：

1）我们如何实现准确的语义表示？

2）哪些方法可以使查询和文档的语义空间对齐？

3）如何使检索器的输出与大型语言模型的偏好对齐？

4.1 Enhancing Semantic Representations增强语义表示

In RAG, the semantic space is essential as it involves the mul-tidimensional mapping of queries and documents. Retrieval accuracy in this semantic space significantly impacts RAG outcomes. This section will present two methods for building accurate semantic spaces.

在RAG中，语义空间至关重要，因为它涉及对查询和文档的多维映射。在这个语义空间中的检索准确性对RAG结果产生显著影响。本节将介绍构建准确语义空间的两种方法。

S1、块优化(最佳确定块大小)

Chunk optimization

When managing external documents, the initial step involves breaking them down into smaller chunks to extract fine-grained features, which are then embedded to represent their semantics. However, embedding overly large or excessively small text chunks may lead to sub-optimal outcomes. There-fore, identifying the optimal chunk size for documents within the corpus is crucial to ensuring the accuracy and relevance of the retrieved results.

块优化

在处理外部文档时，初始步骤涉及将它们分解成更小的块，以提取细粒度特征，然后嵌入以表示其语义。然而，嵌入过大或过小的文本块可能导致次优结果。因此，识别语料库中文档的最佳块大小对确保检索结果的准确性和相关性至关重要。

分块策略：没有一种“最佳”策略适用于所有情况，只有对特定上下文最合适的策略，比如sentence-transformer(适合单个句子)、text-embedding-ada-002(适合256或512个标记的块)

Choosing an appropriate chunking strategy requires care-ful consideration of several vital factors, such as the nature of the indexed content, the embedding model and its opti-mal block size, the expected length and complexity of user queries, and the specific application’s utilization of the re-trieved results. For instance, the selection of a chunking model should be based on the content’s length—whether it is longer or shorter. Additionally, different embedding mod-els demonstrate distinct performance characteristics at vary-ing block sizes. For example, sentence-transformer performs better with single sentences, while text-embedding-ada-002 excels with blocks containing 256 or 512 tokens.

选择适当的分块策略需要仔细考虑几个关键因素，例如索引内容的性质、嵌入模型及其最佳块大小、用户查询的预期长度和复杂性，以及特定应用对检索结果的利用方式。例如，选择分块模型应该基于内容的长度，无论是长还是短。此外，不同的嵌入模型在不同块大小下表现出不同的性能特征。例如，sentence-transformer在单个句子下表现更好，而text-embedding-ada-002在包含256或512个标记的块中表现出色。

Additionally, factors like the length and complexity of user input questions, and the specific needs of the application (e.g., semantic search or question answering), have effect on the choice of a chunking strategy. This choice can be directly in-fluenced by the token limits of the selected LLMs, requiring adjustments to the block size. In reality, getting precise query results involves flexibly applying different chunking strate-gies. There is no one-size-fits-all ”best” strategy, only the most appropriate one for a particular context.

此外，用户输入问题的长度和复杂性以及应用程序的特定需求(例如，语义搜索或问题回答)等因素也会影响分块策略的选择。

这个选择可以直接受到选择的LLMs的令牌限制的影响，需要对块大小进行调整。实际上，获取精确的查询结果涉及到灵活地应用不同的分块策略。没有一种“最佳”策略适用于所有情况，只有对特定上下文最合适的策略。

滑动窗口技术(过合并跨多个块+分层检索)、small2big(初始小块+随后大块)

Current research in RAG explores various block optimiza-tion techniques aimed at improving both retrieval efficiency and accuracy. One such approach involves the use of slid-ing window technology, enabling layered retrieval by merg-ing globally related information across multiple retrieval pro-cesses. Another strategy, known as the “small2big” method, utilizes small text blocks during the initial search phase and subsequently provides larger related text blocks to the lan-guage model for processing.

当前RAG的研究探索了各种块优化技术，旨在提高检索效率和准确性。

>> 其中一种方法涉及使用滑动窗口技术，通过合并跨多个块的信息来构建更全面的文档表示，可以实现分层检索。在多个检索过程中合并全局相关信息，实现分层检索。

>> 另一种策略，称为“small2big”方法，它在初始搜索阶段使用小文本块，随后提供较大的相关文本块供语言模型处理。

摘要嵌入技术(基于文档摘要)、元数据过滤技术(利用文档元数据增强过滤过程)、图索引技术(适合多跳问题场景)

The abstract embedding technique prioritizes top K re-trieval based on document abstracts (or summaries), offering a comprehensive understanding of the entire document con-text. Additionally, the metadata filtering technique leverages document metadata to enhance the filtering process. An in-novative approach, the graph indexing technique, transforms entities and relationships into nodes and connections, sig-nificantly improving relevance, particularly in the context of multi-hop problems.

摘要嵌入技术基于文档摘要（或总结）优先考虑前K个检索结果，提供对整个文档上下文的全面理解。此外，元数据过滤技术利用文档元数据增强过滤过程。一种创新的方法是图索引技术，将实体和关系转化为节点和连接，显著提高了相关性，特别是在多跳问题的背景下。

The combination of these diverse methods has led to no-table advancements, resulting in enhanced retrieval outcomes and improved performance for RAG.

这些不同方法的组合已经取得了显著的进展，从而增强了检索结果并改进了RAG的性能。

S2、微调嵌入模型(将块和查询嵌入到语义空间)：两个主要范式

优秀的嵌入模型：比如AngIE、Voyage、BGE等，但特定领域受到制

Fine-tuning Embedding Models

Once the appropriate size of chunks is determined, the next crucial step involves embedding these chunks and the query into the semantic space using an embedding model. The effectiveness of the embedding is critical as it impacts the model’s ability to represent the corpus. Recent re-search has introduced prominent embedding models such as AngIE, Voyage, BGE,etc [Li and Li, 2023, VoyageAI, 2023, BAAI, 2023]. These models have undergone pre-training on extensive corpora. However, their capability to accurately capture domain-specific information may be limited when ap-plied to specialized domains.

Moreover, task-specific fine-tuning of embedding models is essential to ensure that the model comprehends the user query in terms of content relevance. A model without fine-tuning may not adequately address the requirements of a spe-cific task. Consequently, fine-tuning an embedding model be-comes crucial for downstream applications. There are two primary paradigms in embedding fine-tuning methods.

微调嵌入模型

一旦确定了适当的块大小，下一个关键步骤是使用嵌入模型将这些块和查询嵌入到语义空间中。嵌入的有效性至关重要，因为它影响模型表示语料库的能力。最近的研究引入了AngIE、Voyage、BGE等突出的嵌入模型[Li and Li, 2023, VoyageAI, 2023, BAAI, 2023]。这些模型在广泛的语料库上进行了预训练。然而，当应用于特定领域时，它们准确捕获特定领域信息的能力可能会受到限制。

此外，对嵌入模型进行任务特定的微调是必不可少的，以确保模型能够理解用户查询的内容相关性。没有经过微调的模型可能无法充分满足特定任务的要求。因此，对嵌入模型进行微调对下游应用至关重要。嵌入微调方法有两个主要范式。

T1、领域知识微调：三大挑战(数据集构建、模型微调、评估阶段)，比如LlamaIndex框架可以增强嵌入模型微调工作流程

Domain Knowledge Fine-tuning. To ensure that an embed-ding model accurately captures domain-specific information, it is imperative to utilize domain-specific datasets for fine-tuning. This process diverges from standard language model fine-tuning, chiefly in the nature of the datasets involved. Typically, the dataset for embedding model fine-tuning en-compasses three principal elements: queries, a corpus, and relevant documents. The model employs these queries to identify pertinent documents within the corpus. The effi-cacy of the model is then gauged based on its ability to re-trieve these relevant documents in response to the queries. The dataset construction, model fine-tuning, and evalua-tion phases each present distinct challenges. The LlamaIn-dex [Liu, 2023] introduces a suite of pivotal classes and func-tions designed to enhance the embedding model fine-tuning workflow, thereby simplifying these intricate processes. By curating a corpus infused with domain knowledge and lever-aging the methodologies offered, one can adeptly fine-tune an embedding model to align closely with the specific require-ments of the target domain.

领域知识微调。为了确保嵌入模型准确地捕获特定于领域的信息，必须使用领域特定的数据集进行微调是至关重要的。这个过程与标准语言模型微调有所不同，主要是涉及的数据集的性质。通常，用于嵌入模型微调的数据集包括三个主要元素：查询、语料库和相关文档。模型利用这些查询在语料库中识别相关文档。然后根据模型检索这些相关文档来评估模型的有效性。数据集构建、模型微调和评估阶段各自提出了不同的挑战。LlamaIndex [Liu, 2023]引入了一套关键的类和函数，旨在增强嵌入模型微调工作流程，从而简化这些复杂的过程。通过策划一个充满领域知识的语料库并利用提供的方法，可以熟练地微调嵌入模型，使其与目标领域的具体要求紧密匹配。

T2、对下游任务进行微调，比如PROMPTAGATOR(利用LLM作为少样本查询生成器，适合数据稀缺)、LLM-Embedder(利用LLM生成奖励信号+采用双信号策略)

Fine-tuning for Downstream Tasks. Fine-tuning embed-ding models for downstream tasks is a critical step in en-hancing model performance. In the realm of utilizing RAG for these tasks, innovative methods have emerged to fine-tune embedding models by harnessing the capabilities of LLMs. For example, PROMPTAGATOR [Dai et al., 2022] utilizes the LLM as a few-shot query generator to cre-ate task-specific retrievers, addressing challenges in super-vised fine-tuning, particularly in data-scarce domains. An-other approach, LLM-Embedder [Zhang et al., 2023a], ex-ploits LLMs to generate reward signals for data across mul-tiple downstream tasks. The retriever is fine-tuned with two types of supervised signals: hard labels for the dataset and soft rewards from the LLMs. This dual-signal approach fos-ters a more effective fine-tuning process, tailoring the embed-ding model to diverse downstream applications.

对下游任务进行微调。对下游任务的嵌入模型进行微调是提高模型性能的关键步骤。在利用RAG进行这些任务的领域中，已经出现了创新的方法，通过利用LLM的能力微调嵌入模型。

例如，PROMPTAGATOR [Dai等人，2022]利用LLM作为少样本查询生成器来创建特定于任务的检索器，解决了监督微调中的挑战，特别是在数据稀缺领域。

另一种方法是LLM-Embedder [Zhang等，2023a]，利用LLM为跨多个下游任务的数据生成奖励信号。检索器使用两种类型的监督信号进行微调：数据集的硬标签和来自LLM的软奖励。这种双信号方法实现了更有效的微调过程，使嵌入模型适应不同的下游应用。

While these methods improve semantic representation by incorporating domain knowledge and task-specific fine-tuning, retrievers may not always exhibit optimal compatibil-ity with certain LLMs. To address this, some researchers have explored direct supervision of the fine-tuning process using feedback from LLMs. This direct supervision seeks to align the retriever more closely with the LLM, thereby improving performance on downstream tasks. A more comprehensive discussion on this topic is presented in Section 4.3.

尽管这些方法通过整合领域知识和任务特定微调来改善语义表示，但检索器可能并不总是与某些LLM表现出最佳的兼容性。为解决这个问题，一些研究人员通过直接监督微调过程，利用LLM的反馈意见，探索了这个问题。这种直接监督旨在使检索器更紧密地与LLM对齐，从而提高在下游任务中的性能。关于这个主题的更全面的讨论在第4.3节中呈现。

4.2 Aligning Queries and Documents对齐查询和文档：两种对齐技术

痛点：用户的原始查询可能存在措辞不准确和缺乏语义信息的问题

In the context of RAG applications, retrievers may utilize a single embedding model for encoding both the query and the documents, or employ separate models for each. Addi-tionally, the user’s original query may suffer from imprecise phrasing and lack of semantic information. Therefore, it is crucial to align the semantic space of the user’s query with those of the documents. This section introduces two funda-mental techniques aimed at achieving this alignment.

在RAG应用的背景下，检索器可以利用单个嵌入模型来对查询和文档进行编码，也可以为每个查询和文档使用单独的模型。此外，用户的原始查询可能存在措辞不准确和缺乏语义信息的问题。因此，将用户查询的语义空间与文档的语义空间对齐是至关重要的。本节介绍了两种旨在实现此对齐的基本技术。

T1、查询重写(对齐查询和文档间的语义)：，比如Query2Doc/ITER-RETGEN(创建伪文档)、HyDE(生成“假设”文档)、RRR(颠倒了传统的检索和阅读顺序)、STEP-BACKPROMPTING

Query Rewriting

Query rewriting is a fundamental approach for aligning the semantics of a query and a document. Methods such as Query2Doc and ITER-RETGEN leverage LLMs to create a pseudo-document by combining the origi-nal query with additional guidance [Wang et al., 2023c, Shao et al., 2023]. HyDE constructs query vectors using textual cues to generate a “hypothetical” document captur-ing essential patterns [Gao et al., 2022]. RRR introduces a framework that reverses the traditional retrieval and read-ing order, focusing on query rewriting [Ma et al., 2023a]. STEP-BACKPROMPTING enables LLMs to perform ab-stract reasoning and retrieval based on high-level con-cepts [Zheng et al., 2023]. Additionally, the multi-query re-trieval method utilizes LLMs to generate and execute multiple search queries simultaneously, advantageous for addressing complex problems with multiple sub-problems.

查询重写

查询重写是对齐查询和文档语义的基本方法。

>> Query2Doc和ITER-RETGEN等方法利用LLM通过将原始查询与额外引导相结合来创建伪文档[Wang et al.， 2023c, Shao et al.， 2023]。

>> HyDE使用文本线索提示构建查询向量，以生成“假设”文档，捕捉关键模式[Gao等人，2022]。

>> RRR引入了一个框架，颠倒了传统的检索和阅读顺序，专注于查询重写[Ma et al.， 2023a]。

>> STEP-BACKPROMPTING使LLM能够基于高级概念执行抽象推理和检索[Zheng等，2023]。

此外，多查询重新检索方法利用LLM同时生成和执行多个搜索查询，有利于解决包含多个子问题的复杂问题。

T2、嵌入转换：比如LlamaIndex(引入一个适配器模块)

Embedding Transformation

Beyond broad strategies such as query rewriting, there exist more granular techniques specifically designed for embed-ding transformations. LlamaIndex [Liu, 2023] exemplifies this by introducing an adapter module that can be integrated following the query encoder. This adapter facilitates fine-tuning, thereby optimizing the representation of query em-beddings to map them into a latent space that is more closely aligned with the intended tasks.

嵌入转换

除了广泛的查询重写等策略之外，还存在专门设计用于嵌入变换的更细粒度技术。LlamaIndex [Liu，2023]通过引入一个适配器模块示范了这一点，该模块可以在查询编码器之后进行集成。该适配器有助于微调，从而优化查询嵌入的表示，将其映射到与目标任务更紧密对齐的潜在空间。

The challenge of aligning queries with structured exter-nal documents, particularly when addressing the incongruity between structured and unstructured data, is addressed by SANTA [Li et al., 2023d]. It enhances the retriever’s sen-sitivity to structured information through two pre-training strategies: first, by leveraging the intrinsic alignment between structured and unstructured data to inform contrastive learn-ing in a structured-aware pre-training scheme; and second, by implementing Masked Entity Prediction. The latter utilizes an entity-centric masking strategy that encourages language models to predict and fill in the masked entities, thereby fos-tering a deeper understanding of structured data.

解决结构化外部文档与查询对齐的挑战，尤其是在处理结构化和非结构化数据之间的不一致性时，SANTA [Li等，2023d]提出了一个解决方案。它通过两种预训练策略增强了检索器对结构化信息的敏感性：

>> 首先，通过利用结构化和非结构化数据之间的内在对齐来指导对比学习，形成结构感知的预训练方案；

>> 其次，通过实施屏蔽实体预测。后者利用以实体为中心的屏蔽策略，鼓励语言模型预测和填充屏蔽实体，从而促进对结构化数据的更深入理解。

The issue of aligning queries with structured exter-nal documents, especially when dealing with the dispar-ity between structured and unstructured data, is tackled by SANTA [Li et al., 2023d]. This approach improves the re-triever’s ability to recognize structured information through two pre-training strategies: firstly, by utilizing the inher-ent alignment between structured and unstructured data to guide contrastive learning in a structured-aware pre-training scheme; and secondly, by employing Masked Entity Predic-tion. The latter uses an entity-centric masking strategy to prompt language models to predict and complete the masked entities, thus promoting a more profound comprehension of structured data.

该段重复上述内容

4.3 Aligning Retriever and LLM对齐检索器和LLM

痛点：RAG管道提高检索命中率不一定会改善最终结果，目的旨在将检索器的输出与LLMs的偏好对齐

In the RAG pipeline, enhancing retrieval hit rate through var-ious techniques may not necessarily improve the final out-come, as the retrieved documents may not align with the spe-cific requirements of the LLMs. Therefore, this section in-troduces two methods aimed at aligning the retriever outputs with the preferences of the LLMs.

在RAG管道中，通过各种技术提高检索命中率不一定会改善最终结果，因为检索的文档可能与LLM的特定需求不一致。因此，本节将介绍两种方法，旨在将检索器的输出与LLMs的偏好对齐的方法。

T1、微调增强/微调检索器(提高协同作用)：比如AAR(引入监视信号)、REPLUG(使用LM作为监督信号)、UPRISE(使用冻结的LLMs来微调提示检索器)、Atlas(四种监督微调嵌入模型的方法)

Fine-tuning Retrievers Several studies utilize feedback signals from LLMs to refine retrieval models. For instance, AAR [Yu et al., 2023b] intro-duces supervisory signals for a pre-trained retriever using an encoder-decoder architecture. This is achieved by identifying the LM’s preferred documents through FiD cross-attention scores. Subsequently, the retriever undergoes fine-tuning with hard negative sampling and standard cross-entropy loss. Ultimately, the refined retriever can be directly applied to en-hance unseen target LMs, resulting in improved performance in the target task. Additionally, it is suggested that LLMs may have a preference for focusing on readable rather than information-rich documents.	微调增强一些研究利用LLM的反馈信号来改进检索模型。例如， >> AAR [Yu等人，2023b]使用编码器-解码器架构为预训练的检索器引入监视信号。这是通过FiD交叉注意分数来识别LM的首选文档来实现的。随后，检索器进行微调，采用硬负采样和标准的交叉熵损失。最终，改进后的检索器可以直接用于增强未见目标LLMs，从而提高目标任务的性能。此外，有人建议LLMs可能更倾向于关注可读而不是信息丰富的文档。
REPLUG [Shi et al., 2023] utilizes a retriever and an LLM to calculate the probability distributions of the retrieved doc-uments and then performs supervised training by computing the KL divergence. This straightforward and effective train-ing method enhances the performance of the retrieval model by using an LM as the supervisory signal, eliminating the need for specific cross-attention mechanisms.	>> REPLUG [Shi et al.， 2023]利用检索器和LLM计算检索文档的概率分布，然后通过计算KL散度进行监督训练。这种简单有效的训练方法通过使用LM作为监督信号来提高检索模型的性能，消除了对特定交叉注意机制的需要。
UPRISE [Cheng et al., 2023a] also employs frozen LLMs to fine-tune the prompt retriever. Both the LLM and the re-triever take prompt-input pairs as inputs and utilize the scores provided by the LLM to supervise the retriever’s training, ef-fectively treating the LLM as a dataset labeler.	>> UPRISE [Cheng et al.， 2023a]还使用冻结的LLMs来微调提示检索器。LLM和检索器都将提示-输入对作为输入，并利用LLM提供的分数监督检索器的训练，将LLM视为数据集标签。
In addition, Atlas [Izacard et al., 2022] proposes four methods of super-vised fine-tuning embedding models: >>Attention Distillation. This approach employs cross-attention scores generated by the LLM during output to distill the model’s knowledge. >>EMDR2. By using the Expectation-Maximization algo-rithm, this method trains the model with retrieved docu-ments as latent variables. >>Perplexity Distillation directly trains the model using the perplexity of generated tokens as an indicator. >>LOOP. This method presents a novel loss function based on the impact of document deletion on LLM prediction, offering an efficient training strategy to better adapt the model to specific tasks.	此外，Atlas [Izacard et al.， 2022]提出了四种监督微调嵌入模型的方法: > >>> 注意蒸馏：该方法利用LLM在输出过程中生成的交叉注意分数来提取模型的知识。 > >>> EMDR2：该方法采用期望最大化算法，以检索到的文档作为潜在变量对模型进行训练。 > >>> Perplexity困惑度蒸馏：直接使用生成的token的Perplexity作为指标来训练模型。 > >>> 循环：该方法提出了一种新的基于文档删除对LLM预测影响的损失函数，提供了一种有效的训练策略，使模型更好地适应特定的任务。
These approaches aim to improve the synergy between the retriever and the LLM, leading to enhanced retrieval perfor-mance and more accurate responses to user inquiries.	这些方法旨在提高检索器和LLM之间的协同作用，从而提高检索性能并更准确地响应用户查询。

T2、适配器(引入援助以进行对齐)：比如PRCA(训练适配器)、RECOMP(生成摘要)、PKG(指令微调)

Adapters

Fine-tuning models may present challenges, such as integrat-ing functionality through an API or addressing constraints arising from limited local computational resources. Con-sequently, some approaches opt to incorporate an external adapter to aid in alignment.

适配器

微调模型可能会带来挑战，例如通过API集成功能或解决由有限的本地计算资源引起的约束。因此，一些方法选择在外部适配器中引入援助以进行对齐。

PRCA trains the adapter through a context extraction phase and a reward-driven phase. The retriever’s out-put is then optimized using a token-based autoregres-sive strategy [Yang et al., 2023b]. The token filtering ap-proach employs cross-attention scores to efficiently fil-ter tokens, selecting only the highest-scoring input to-kens [Berchansky et al., 2023].RECOMP introduces both ex-tractive and generative compressors for summary generation. These compressors either select relevant sentences or syn-thesize document information, creating summaries tailored to multi-document queries [Xu et al., 2023a].

>> PRCA通过上下文提取阶段和奖励驱动阶段训练适配器。然后使用基于令牌的自回归策略对检索器的输出进行优化[Yang等人，2023b]。令牌过滤方法采用交叉注意分数来有效地过滤令牌，只选择得分最高的输入令牌[Berchansky等人，2023]。

>> RECOMP引入了提取压缩器和生成压缩器来生成摘要。这些压缩器要么选择相关句子，要么合成文档信息，创建适合多文档查询的摘要[Xu等人，2023a]。

Furthermore, PKG introduces an innovative method for in-tegrating knowledge into white-box models via directive fine-tuning [Luo et al., 2023]. In this approach, the retriever mod-ule is directly substituted to generate relevant documents ac-cording to a query. This method assists in addressing the dif-ficulties encountered during the fine-tuning process and en-hances model performance.

>> 此外，PKG引入了一种通过指令微调将知识集成到白盒模型中的创新方法[Luo等人，2023]。在这种方法中，直接替换检索器模块，根据查询生成相关文档。该方法有助于解决在微调过程中遇到的困难，并提高模型性能。

5 Generation生成

RAG的生成器：输入不仅包括典型的上下文信息，还包括通过检索器获得的相关文本片段→目的是深入了解问题的上下文

A crucial component of RAG is its generator, which is re-sponsible for converting retrieved information into coherent and fluent text. Unlike traditional language models, RAG’s generator sets itself apart by improving accuracy and rele-vance via the incorporation of retrieved data. In RAG, the generator’s input encompasses not only typical contextual in-formation but also relevant text segments obtained through the retriever. This comprehensive input enables the generator to gain a deep understanding of the question’s context, result-ing in more informative and contextually relevant responses.

RAG的一个关键组件是它的生成器，它负责将检索到的信息转换成连贯流畅的文本。与传统的语言模型不同，RAG的生成器通过整合检索到的数据来提高准确性和相关性，从而使自己与众不同。在RAG中，生成器的输入不仅包括典型的上下文信息，还包括通过检索器获得的相关文本片段。这种全面的输入使生成器能够深入了解问题的上下文，从而产生更多信息和上下文相关的响应。

Furthermore, the generator is guided by the retrieved text to ensure coherence between the generated content and the ob-tained information. The diverse input data has led to targeted efforts during the generation phase, all aimed at refining the adaptation of the large model to the input data derived from queries and documents. In the following subsections, we will explore the introduction of the generator by delving into as-pects of post-retrieval processing and fine-tuning.

此外，生成器由检索文本引导，以确保生成的内容与获取的信息之间的一致性。多样化的输入数据在生成阶段引起了有针对性的努力，都旨在优化大型模型对查询和文档衍生的输入数据的适应。

在接下来的小节中，我们将通过深入研究后检索处理和微调的各个方面来探讨生成器的介绍。

5.1 Post-retrieval with Frozen LLM使用冻结LLM进行后检索

痛点：LLM的上下文长度的限制和对冗余信息的敏感性

In the realm of untunable LLMs , many studies rely on well-established models like GPT-4 [OpenAI, 2023] to harness their comprehensive internal knowledge for systematically synthesizing retrieved information from various documents.

However, challenges persist with these large models, includ-ing limitations on context length and susceptibility to redun-dant information. To tackle these issues, certain research en-deavors have turned their focus to post-retrieval processing.

在无法调整的LLM领域，许多研究依赖于已建立的模型，如GPT-4 [OpenAI, 2023]，以利用其全面的内部知识，系统地合成来自各种文档的检索信息。

然而，这些大型模型仍然存在挑战，包括上下文长度的限制和对冗余信息的敏感性。为了解决这些问题，一些研究人员将重点转向了后检索处理。

后检索处理中常见的两大操作

Post-retrieval processing involves treating, filtering, or op-timizing the relevant information retrieved by the retriever from a large document database. Its main goal is to enhance the quality of retrieval results, aligning them more closely with user needs or subsequent tasks. It can be viewed as a reprocessing of the documents obtained during the retrieval phase. Common operations in post-retrieval processing typi-cally include information compression and result reranking.

后检索处理包括处理、过滤或优化检索器从大型文档数据库检索到的相关信息。它的主要目标是提高检索结果的质量，使它们更贴近用户需求或后续任务。它可以看作是对检索阶段获得的文档的再处理。后检索处理中的常见操作通常包括信息压缩和结果重排序。

信息压缩：因需管理大量信息、LLM的上下文限制→提出信息压缩，降低噪声+解决上下文长度限制+增强生成效果，比如PRCA(训练一个信息提取器)、RECOMP(采用对比学习)、Filter-Reranker(减少文档的数量)等

Information Compression

The retriever excels at retrieving relevant information from a vast knowledge base, but managing the substantial amount of information within retrieval documents is a challenge. Ongo-ing research aims to extend the context length of large lan-guage models to tackle this issue. However, current large models still struggle with context limitations. Therefore, there are scenarios where condensing information becomes necessary. Information condensation is significant for reduc-ing noise, addressing context length restrictions, and enhanc-ing generation effects.

信息压缩

检索器擅长从庞大的知识库中检索相关信息，但是管理检索文档中的大量信息是一个挑战。正在进行的研究旨在扩展大型语言模型的上下文长度来解决这个问题。然而，当前的大型模型在处理上下文限制方面仍然存在困难。因此，在某些情况下，压缩信息是必要的。信息凝聚对于降低噪声、解决上下文长度限制和增强生成效果具有重要意义。

PRCA tackled this issue by training an information ex-tractor [Yang et al., 2023b]. In the context extraction phase, when provided with an input text Sinput, it is capable of producing an output sequence Cextracted that represents the condensed context from the input document. The train-ing process is designed to minimize the difference between Cextracted and the actual context Ctruth.

Similarly, RECOMP adopts a comparable approach by training an information condenser using contrastive learn-ing [Xu et al., 2023a]. Each training data point consists of one positive sample and five negative samples, and the en-coder undergoes training using contrastive loss throughout this process [Karpukhin et al., 2020] .

>> PRCA通过训练一个信息提取器[Yang et al., 2023b]来解决这个问题。在上下文提取阶段，当提供输入文本Sinput时，它能够产生一个输出序列Cextracted，表示从输入文档中提取的压缩上下文。训练过程旨在最小化Cextracted和实际上下文Ctruth之间的差异。

>> 类似地，RECOMP采用了一种类似的方法，即使用对比学习来训练信息收集器[Xu et al.， 2023a]。每个训练数据点由一个正样本和五个负样本组成，编码器在整个过程中使用对比损失进行训练[Karpukhin et al.， 2020]。

Another study has taken a different approach by aim-ing to reduce the number of documents in order to im-prove the accuracy of the model’s answers. In the study by [Ma et al., 2023b], they propose the “Filter-Reranker” paradigm, which combines the strengths of LLMs and Small Language Models (SLMs). In this paradigm, SLMs serve as filters, while LLMs function as reordering agents. The re-search shows that instructing LLMs to rearrange challeng-ing samples identified by SLMs leads to significant improve-ments in various Information Extraction (IE) tasks.

>> 另一项研究采取了不同的方法，旨在减少文档的数量，以提高模型答案的准确性。在[Ma et al.， 2023b]的研究中，他们提出了“Filter-Reranker”范式，该范式结合了LLM和小语言模型(Small Language Models, slm)的优势。在这个范式中，SLMs充当过滤器，而LLMs则充当重新排序代理。研究表明，指导LLMs重新排列SLMs识别出的具有挑战性的样本，可以显著提高各种信息提取（IE）任务的性能。

重排序：引入更多的上下文易导致模型性能下降→采用重排序→目标是以更集中和准确的方式呈现最相关的信息

Reranking

The re-ranking model is pivotal in optimizing the document set retrieved from the retriever. Language models often face performance declines when additional context is introduced, and re-ranking effectively addresses this issue. The core con-cept involves rearranging document records to prioritize the most relevant items at the top, thereby limiting the total num-ber of documents. This not only resolves the challenge of context window expansion during retrieval but also enhances retrieval efficiency and responsiveness.

The re-ranking model assumes a dual role throughout the information retrieval process, functioning as both an optimizer and a refiner. It provides more effective and accurate input for subsequent language model process-ing [Zhuang et al., 2023].

Reranking

重新排序模型是优化从检索器检索到的文档集的关键。当引入更多的上下文时，语言模型经常面临性能下降的问题，重新排序可以有效地解决这个问题。核心概念包括重新排列文档记录，将最相关的项放在最上面，从而限制文档的总数。这不仅解决了检索期间上下文窗口扩展的挑战，还增强了检索效率和响应能力。

重新排序模型在信息检索过程中扮演双重角色，既是优化器又是精炼器。它为后续语言模型处理提供更有效和准确的输入[Zhuang et al., 2023]。

Contextual compression is incorporated into the reorder-ing process to offer more precise retrieval information. This method entails reducing the content of individual documents and filtering the entire document, with the ultimate goal of presenting the most relevant information in the search results for a more focused and accurate display of pertinent content.

上下文压缩被整合到重新排序过程中，以提供更精确的检索信息。这种方法涉及减少单个文档的内容并过滤整个文档，其最终目标是以更集中和准确的方式呈现最相关的信息，以更聚焦和准确地显示相关内容。

5.2 Fine-tuning LLM for RAG为RAG进行LLM微调

优化生成器：确保生成的文本既自然又有效地利用检索到的文档

Optimizing the generator within the RAG model is a critical aspect of its architecture. The generator’s role is to take the retrieved information and produce relevant text, forming the final output of the model. The optimization of the generator aims to ensure that the generated text is both natural and ef-fectively leverages the retrieved documents to better meet the user’s query needs.

在RAG模型中优化生成器是其架构的一个关键方面。生成器的作用是获取检索到的信息并生成相关文本，形成模型的最终输出。

生成器的优化旨在确保生成的文本既自然又有效地利用检索到的文档，以更好地满足用户的查询需求。

In standard LLMs generation tasks, the input typically consists of a query. RAG stands out by incorporating not only a query but also various retrieved documents (struc-tured/unstructured) by the retriever into the input. This ad-ditional information can significantly influence the model’s understanding, particularly for smaller models. In such cases, fine-tuning the model to adapt to the input of both query and retrieved documents becomes crucial. Before presenting the input to the fine-tuned model, post-retrieval processing usu-ally occurs for the documents retrieved by the retriever. It is essential to note that the fine-tuning method for the genera-tor in RAG aligns with the general fine-tuning approach for LLMs. In the following, we will briefly describe some rep-resentative works involving data (formatted/unformatted) and optimization functions.

在标准LLMs生成任务中，输入通常包括一个查询。RAG的突出之处在于，它不仅将查询，还将检索器检索到的各种文档(结构化/非结构化)合并到输入中。这种附加信息可以显著影响模型的理解，特别是对于较小的模型。在这种情况下，对模型进行微调以适应查询和检索文档的输入变得至关重要。在将输入呈现给微调模型之前，通常会对检索器检索到的文档进行后检索处理。必须注意的是，RAG模型中生成器的微调方法与LLMs的一般微调方法一致。下面，我们将简要介绍一些涉及数据(格式化/未格式化)和优化函数的代表性工作。

A1、一般优化过程：比如Self-Mem，或Joint-Encoder联合编码器、Dual-Encoder双编码器

General Optimization Process

As part of the general optimization process, the training data typically consists of input-output pairs, aiming to train the model to produce the output y given the input x. In the work of Self-Mem [Cheng et al., 2023b], a traditional training process is employed, where given the input x, relevant documents z are retrieved (selecting Top-1 in the paper), and after integrating (x, z), the model generates the output y. The paper utilizes two common paradigms for fine-tuning, namely Joint-Encoder and Dual-Encoder [Arora et al., 2023, Wang et al., 2022b, Lewis et al., 2020, Xia et al., 2019, Cai et al., 2021, Cheng et al., 2022].

一般优化过程

作为一般优化过程的一部分，训练数据通常由输入-输出对组成，目的是训练模型在给定输入x的情况下产生输出y。在Self-Mem [Cheng et al.， 2023b]的工作中，采用了传统的训练过程，给定输入x，检索相关文档z(在本文中选择Top-1)，在对(x, z)进行积分后，模型生成输出y。本文采用了两种常见的范式进行微调：即Joint-Encoder联合编码器和Dual-Encoder双编码器[Arora等，2023,Wang等，2022b, Lewis等，2020,Xia等，2019,Cai等，2021,Cheng等，2022]。

In the Joint-Encoder paradigm, a standard model based on an encoder-decoder is used. Here, the encoder initially en-codes the input, and the decoder, through attention mecha-nisms, combines the encoded results to generate tokens in an autoregressive manner. On the other hand, in the Dual-Encoder paradigm, the system sets up two independent en-coders, with each encoder encoding the input (query, con-text) and the document, respectively. The resulting out-puts undergo bidirectional cross-attention processing by the decoder in sequence. Both architectures utilize the Trans-former [Vaswani et al., 2017] as the foundational block and optimize with Negative Log-Likelihood loss.

>> Joint-Encoder联合编码器：在联合编码器范例中，使用了基于编码器-解码器的标准模型。在这里，编码器最初对输入进行编码，而解码器通过注意机制组合编码结果，以自回归的方式生成标记。

>> Dual-Encoder双编码器：另一方面，在双编码器范例中，系统设置了两个独立的编码器，每个编码器分别编码输入(查询、上下文)和文档进行编码。由此产生的输出由解码器按顺序进行双向交叉注意处理。

这两种架构都使用transformer [Vaswani等人，2017]作为基础块，并使用负对数似然损失进行优化。

A2、运用对比学习：痛点(只在单个正确的输出样本上进行训练)→采用对比学习，减轻暴露偏差→减少过拟合和增强模型的泛化能力,比如SURGE(使用图文对比学习)、SANTA(三部分的训练方案)

Utilizing Contrastive Learning In the phase of preparing training data for language mod-els, interaction pairs of input and output are usually created. This traditional method can lead to ”exposure bias,” where the model is only trained on individual, correct output ex-amples, thus restricting its exposure to a range of possible outputs citesequence. This limitation can hinder the model’s real-world performance by causing it to overfit to the partic-ular examples in the training set, thereby reducing its ability to generalize across various contexts.	运用对比学习在为语言模型准备训练数据的阶段，通常会创建输入和输出的交互对。这种传统方法可能导致“暴露偏差”，即模型只在单个正确的输出样本上进行训练，从而限制了其对可能输出范围的了解。这种限制可能会导致模型过度拟合训练集中的特定示例，从而降低其在各种上下文中泛化的能力，从而阻碍模型的实际性能。
To mitigate exposure bias, SURGE [Kang et al., 2023] proposes the use of graph-text contrastive learning. This method includes a contrastive learning objective that prompts the model to produce a range of plausible and coherent re-sponses, expanding beyond the instances encountered in the training data. This approach is crucial in reducing overfitting and strengthening the model’s ability to generalize. For retrieval tasks that engage with structured data, the SANTA framework [Li et al., 2023d] implements a tripartite training regimen to effectively encapsulate both structural and semantic nuances. The initial phase focuses on the retriever, where contrastive learning is harnessed to refine the query and document embeddings.	>> 为了减轻暴露偏差，SURGE [Kang等人，2023]提出使用图文对比学习。这种方法包括一个对比学习目标，促使模型产生一系列合理和连贯的响应，扩展到训练数据中遇到的实例之外。这种方法对于减少过拟合和增强模型的泛化能力至关重要。对于涉及结构化数据的检索任务， >>SANTA框架[Li et al.， 2023]实现了一个三部分的训练方案，以有效地封装结构和语义的细微差别。初始阶段侧重于检索器，利用对比学习来细化查询和文档嵌入。
Subsequently, the generator’s preliminary training stage employs contrastive learning to align the structured data with its unstructured document descriptions. In a further stage of generator training, the model acknowledges the critical role of entity semantics in the representation learning of textual data for retrieval, as highlighted by [Sciavolino et al., 2021, Zhang et al., 2019]. This process commences with the identi-fication of entities within the structured data, followed by the application of masks over these entities within the generator’s input data, thus setting the stage for the model to anticipate and predict these masked elements.	随后，生成器的初步训练阶段采用对比学习将结构化数据与其非结构化文档描述对齐。在生成器训练的进一步阶段，模型认识到实体语义在用于检索的文本数据的表示学习中的关键作用，正如[Sciavolino et al., 2021, Zhang et al., 2019]所强调的。这个过程从识别结构化数据中的实体开始，然后在生成器的输入数据中对这些实体应用掩码，从而为模型预测和预测这些掩码元素奠定基础。
The training regimen progresses with the model learning to reconstruct the masked entities by leveraging contextual information. This exercise cultivates the model’s comprehen-sion of the textual data’s structural semantics and facilitates the alignment of pertinent entities within the structured data. The overarching optimization goal is to train the language model to accurately restore the obscured spans, thereby en-riching its understanding of entity semantics [Ye et al., 2020].	训练方案随着模型学习的进展，利用上下文信息重构被屏蔽的实体。这个练习培养了模型对文本数据的结构语义的理解，并促进了结构化数据中相关实体的对齐。总总体优化目标是训练语言模型准确地恢复被遮蔽的跨度，从而丰富其对实体语义的理解[Ye et al., 2020]。

6 Augmentation in RAG增强

This section is structured around three key aspects: the aug-mentation stage, sources of augmentation data, and the aug-mentation process. These facets elucidate the critical tech-nologies pivotal to RAG’s development. A taxonomy of RAG’s core components is presented in Figure 4.

本节围绕三个关键方面展开：增强阶段、增强数据来源和增强过程。这些方面阐明了对RAG发展至关重要的关键技术。RAG的核心组件的分类如图4所示。

6.1 RAG in Augmentation Stages增强阶段：三个步骤中整合了各种技术

RAG, a knowledge-intensive endeavor, incorporates a vari-ety of technical methodologies across the pre-training, fine-tuning, and inference stages of language model training.

RAG是一项知识密集型的工作，它在语言模型训练的预训练、微调和推理阶段整合了各种技术方法。

预训练阶段：比如REALM(采用结构化检索方法)、RETRO(从头开始进行大规模预训练)、COG(新的文本生成方法)、RETRO++等

>> REALM等模型，提升开域问答能力，采用结构化检索方法进行知识嵌入和预训练。

>> RETRO等模型，在从零开始大规模预训练时采用检索增强，降低参数量但提高性能。

>> COG等模型，提出新的文本生成方法，效仿从预存集合复制文本片段。

Pre-training Stage

During the pre-training stage, researchers have investigated methods to bolster PTMs for open-domain QA through retrieval-based strategies. The REALM model adopts a struc-tured, interpretable method for knowledge embedding, fram-ing pre-training, and fine-tuning as a retrieve-then-predict workflow within the masked language model (MLM) frame-work [Arora et al., 2023] .

训练的阶段

在预训练阶段，研究人员通过检索性的策略探索了强化开放领域问答（QA）的预训练模型（PTM）的方法。

>> REALM模型采用结构化、可解释的方法进行知识嵌入、框架预训练和微调，作为遮蔽语言模型(MLM)框架内的检索-预测工作流[Arora等人，2023]。

RETRO [Borgeaud et al., 2022] leverages retrieval aug-mentation for large-scale pre-training from scratch, achieving a reduction in model parameters while surpassing standard GPT models in terms of perplexity. RETRO distinguishes it-self with an additional encoder designed to process features of entities retrieved from an external knowledge base, build-ing on the foundational structure of GPT models.

Atlas[Izacard et al., 2022] also incorporates a retrieval mechanism into the T5 architecture [Raffel et al., 2020] in both the pre-training and fine-tuning stages. It uses a pre-trained T5 to initialize the encoder-decoder language model and a pre-trained Contriever for the dense retriever, improv-ing its efficiency for complex language modeling tasks.

>> RETRO [Borgeaud等人，2022]利用检索增强从头开始进行大规模预训练，实现了模型参数的减少，同时在困惑度方面超过了标准GPT模型。RETRO的独特之处在于它有一个额外的编码器，该编码器设计用于处理从外部知识库检索到的实体的特征，建立在GPT模型的基础结构上。

>> Atlas[Izacard等人，2022]还在T5架构[rafael等人，2020]中的预训练和微调阶段引入了检索机制。它使用预训练的T5来初始化编码器-解码器语言模型，使用预训练的Contriever来初始化密集检索器，从而提高了复杂语言建模任务的效率。

Furthermore, COG [Lan et al., 2022] introduces a novel text generation methodology that emulates copying text frag-ments from pre-existing collections. Utilizing efficient vector search tools, COG computes and indexes contextually mean-ingful representations of text fragments, demonstrating supe-rior performance in domains such as question-answering and domain adaptation when compared to RETRO.

The advent of scaling laws has catalyzed the growth of model parameters, propelling autoregressive models into the mainstream. Researchers are expanding the RAG approach to pretrained larger models, with RETRO++ exemplifying this trend by scaling up the model parameters while preserving or enhancing performance [Wang et al., 2023b].

>> COG 此外，COG [Lan等人，2022]引入了一种新的文本生成方法，该方法模拟从预先存在的集合中复制文本片段。利用高效的向量搜索工具，COG计算和索引文本片段的上下文有意义的表示，与RETRO相比，在问答和领域适应等领域表现出卓越的性能。

>> RETRO++ 规模定律的出现促进了模型参数的增长，推动自回归模型成为主流。研究人员正在将RAG方法扩展到预训练更大的模型，RETRO++通过在保持或增强性能的同时扩大模型参数来体现这一趋势[Wang等人，2023b]。

增强预训练的优点：改进了文本生成质量、参数更少、适用于处理知识密集型任务、获得的基础模型性能强韧

检索增强预训练技术在以下方面的优点:

>> 改进了文本生成质量、事实准确率以及下游任务效能，尤其适用于知识密集型应用如开放域问答。

>> 与标准GPT模型相比，检索增强预训练产生模型性能更强、参数消耗更低。

>> 该方法特别适用于处理知识密集型任务，并通过专业语料库训练可以发展领域特定模型。

>> 获得的基础模型性能强韧，训练后可以独立工作，提升生成速度和运行效率。

Empirical evidence underscores marked improvements in text generation quality, factual accuracy, reduced toxicity, and downstream task proficiency, especially in knowledge-intensive applications like open-domain QA. These results imply that integrating retrieval mechanisms into the pretraining of autoregressive language models constitutes a promising avenue, marrying sophisticated retrieval tech-niques with expansive language models to yield more precise and efficient language generation.

实证证据强调了在文本生成质量、事实准确性、毒性降低和下游任务熟练度方面的显著改进，特别是在知识密集型应用程序如开放领域QA中。这些结果表明，在自回归语言模型的预训练中集成检索机制构成了一个有前途的途径，将复杂的检索技术与广泛的语言模型相结合，以产生更精确和高效的语言生成。

The benefits of augmented pre-training include a robust foundational model that outperforms standard GPT models in perplexity, text generation quality, and task-specific per-formance, all while utilizing fewer parameters. This method is particularly adept at handling knowledge-intensive tasks and facilitates the development of domain-specific models through training on specialized corpora.

增强预训练的好处包括一个稳健的基础模型，其在困惑度、文本生成质量和任务特定性能方面优于标准的GPT模型，同时使用更少的参数。这种方法特别擅长处理知识密集型任务，并通过在专门的语料库上进行训练促进领域特定模型的发展。

增强预训练的挑战：大量资源、降低更新频率

>> 需要大量预训练资源与数据支持。

>> 模型规模增大会降低更新频率。

Nonetheless, this approach faces challenges such as the necessity for extensive pre-training datasets and resources, as well as diminished update frequencies with increasing model sizes. Despite these hurdles, the approach offers significant advantages in model resilience. Once trained, retrieval-enhanced models can operate independently of ex-ternal libraries, enhancing generation speed and operational efficiency. The potential gains identified render this method-ology a compelling subject for ongoing investigation and in-novation in artificial intelligence and machine learning.

尽管如此，这种方法面临着挑战，例如需要广泛的预训练数据集和资源，以及随着模型大小的增加而减少的更新频率。尽管存在这些障碍，但该方法在模型弹性方面提供了显著的优势。经过训练后，检索增强模型可以独立于外部库运行，从而提高了生成速度和操作效率。所鉴定的潜在收益使得这种方法在人工智能和机器学习领域的持续研究和创新中成为一个引人注目的主题。

微调阶段：可以对齐查询和文档之间的差异、满足进行风格化和定向调整、对齐检索器和生成器以改进模型协同

两者结合更强：RAG和Fine-tuning是增强LLM的强大工具

Fine-tuning Stage

RAG and Fine-tuning are powerful tools for enhancing LLMs, and combining the two can meet the needs of more specific scenarios. On one hand, fine-tuning allows for the retrieval of documents with a unique style, achieving bet-ter semantic expression and aligning the differences between queries and documents. This ensures that the output of the retriever is more aptly suited to the scenario at hand. On the other hand, fine-tuning can fulfill the generation needs of making stylized and targeted adjustments. Furthermore, fine-tuning can also be used to align the retriever and generator for improved model synergy.

微调阶段

RAG和Fine-tuning是增强LLM的强大工具，将两者结合起来可以满足更具体场景的需求。

>> 一方面，微调允许检索具有独特样式的文档，实现更好的语义表达，并对齐查询和文档之间的差异。这确保了检索器的输出更适合手头的场景。

>> 另一方面，微调可以满足进行风格化和定向调整的生成需求。

此外，微调还可以用于对齐检索器和生成器，以改进模型协同。

微调检索模型提升语义表示质量

The main goal of fine-tuning the retriever is to improve the quality of semantic representations, achieved by directly fine-tuning the Embedding model using a corpus [Liu, 2023]. By aligning the retriever’s capabilities with the prefer-ences of the LLMs through feedback signals, both can be better coordinated [Yu et al., 2023b, Izacard et al., 2022, Yang et al., 2023b, Shi et al., 2023]. Fine-tuning the retriever for specific downstream tasks can lead to improved adapt-ability [cite]. The introduction of task-agnostic fine-tuning aims to enhance the retriever’s versatility in multi-task sce-narios [Cheng et al., 2023a].

微调检索器的主要目标是通过使用语料库直接微调嵌入模型来提高语义表示的质量[Liu, 2023]。通过反馈信号使检索器的能力与LLM的偏好一致，可以更好地协调两者[Yu et al.， 2023b, Izacard et al.， 2022, Yang et al.， 2023b, Shi et al.， 2023]。为特定的下游任务微调检索器可以提高适应能力[引用]。引入任务无关微调的目的是增强检索器在多任务场景中的多功能性[Cheng等人，2023a]。

微调生成模型实现针对性输出

Fine-tuning generator can result in outputs that are more stylized and customized. On one hand, it allows for specialized adaptation to different input data formats. For example, fine-tuning LLMs to fit the structure of knowledge graphs [Kang et al., 2023], the structure of text pairs [Kang et al., 2023, Cheng et al., 2023b], and other spe-cific structures [Li et al., 2023d]. On the other hand, by con-structing directive datasets, one can demand LLMs to gen-erate specific formats content. For instance, in adaptive or iterative retrieval scenarios, LLMs are fine-tuned to generate content that will help determine the timing for the next step of action [Jiang et al., 2023b, Asai et al., 2023].

微调生成器可以产生更加风格化和定制的输出。一方面，它允许专门适应不同的输入数据格式。例如，微调LLM以拟合知识图的结构[Kang等人，2023]、文本对的结构[Kang等人，2023,Cheng等人，2023b]和其他特定结构[Li等人，2023d]。另一方面，通过构建指令数据集，可以要求LLM生成特定格式的内容。例如，在自适应或迭代检索场景中，LLM被微调以生成有助于确定下一步行动时间的内容[Jiang等人，2023b, Asai等人，2023]。

联合微调检索器和生成器提升泛化能力：比如RA-DIT(双指令调优)

By synergistically fine-tuning both the retriever and the generator, we can enhance the model’s generalization capabilities and avoid overfitting that may arise from training them separately. However, joint fine-tuning also leads to increased resource consumption. RA-DIT [Lin et al., 2023] presents a lightweight, dual-instruction tuning framework that can effectively add retrieval capabilities to any LLMs. The retrieval-enhanced directive fine-tuning updates the LLM, guiding it to make more efficient use of the information re-trieved and to disregard distracting content.	通过协同微调检索器和生成器，我们可以增强模型的泛化能力，并避免单独训练它们可能产生的过拟合。然而，联合微调也会导致资源消耗增加。RA-DIT [Lin等，2023]提出了一种轻量级的双指令调优框架，可以有效地为任何LLM添加检索功能。检索增强指令微调更新LLM，指导它更有效地利用检索到的信息，并忽略分散注意力的内容。
Despite its advantages, fine-tuning has limitations, includ-ing the need for specialized datasets for RAG fine-tuning and the requirement for significant computational resources. However, this stage allows for customizing models to specific needs and data formats, potentially reducing resource usage compared to the pre-training phase while still being able to fine-tune the model’s output style.	尽管有其优点，但微调也有局限性，包括需要专门的数据集进行RAG微调以及需要大量的计算资源。然而，这个阶段允许根据特定的需求和数据格式定制模型，与预训练阶段相比，潜在地减少了资源使用，同时仍然能够微调模型的输出样式。
In summary, the fine-tuning stage is essential for the adap-tation of RAG models to specific tasks, enabling the refine-ment of both retrievers and generators. This stage enhances the model’s versatility and adaptability to various tasks, de-spite the challenges presented by resource and dataset re-quirements. The strategic fine-tuning of RAG models is therefore a critical component in the development of efficient and effective retrieval-augmented systems.	总之，微调阶段对于使RAG模型适应特定的任务是必不可少的，从而可以对检索器和生成器进行细化。这一阶段增强了模型的通用性和对各种任务的适应性，尽管存在资源和数据集需求带来的挑战。因此，RAG模型的战略性微调是开发高效和有效的检索增强系统的关键组成部分。

推理阶段：

初级RAG：在此阶段导入检索内容指导生成；

Inference Stage

The inference stage in RAG models is crucial, as it in-volves extensive integration with LLMs. Traditional RAG approaches, also known as Naive RAG, involve incorporating retrieval content at this stage to guide the generation process.

推理阶段

RAG模型中的推理阶段是至关重要的，因为它涉及到与LLM的广泛集成。传统的RAG方法，也称为初级RAG，涉及在此阶段合并检索内容以指导生成过程。

先进的技术引入了更多上下文丰富的信息，如利用交互文本或知识模块

To overcome the limitations of Naive RAG, advanced tech-niques introduce more contextually rich information dur-ing inference. The DSP framework [Khattab et al., 2022] utilizes a sophisticated exchange of natural language text between fronzen LMs and retrieval models (RMs), en-riching the context and thereby improving generation out-comes. The PKG [Luo et al., 2023] method equips LLMs with a knowledge-guided module that allows for the retrieval of pertinent information without modifying the LMs’ pa-rameters, enabling more complex task execution. CREA-ICL [Li et al., 2023b] employs a synchronous retrieval of cross-lingual knowledge to enhance context, while RE-CITE [Sun et al., 2022] generates context by sampling para-graphs directly from LLMs.

为了克服初级RAG的局限性，先进的技术在推理过程中引入了更多上下文丰富的信息。

>> DSP框架[Khattab等人，2022]利用了前置LM和检索模型（RM）之间的自然语言文本的复杂交换，丰富了上下文，从而改善了生成结果。

>> PKG [Luo等人，2023]方法为LLM配备了一个知识引导模块，该模块允许在不修改LMs的pa参数的情况下检索相关信息，从而能够执行更复杂的任务。

>> CREA-ICL [Li et al.， 2023b]采用跨语言知识的同步检索来增强上下文，而RE-CITE [Sun et al.， 2022]通过直接从LLM中采样段落来生成上下文。

考虑多步推理任务，采用迭代检索或链式推理

Further refinement of the RAG process during infer-ence is seen in approaches that cater to tasks necessi-tating multi-step reasoning. ITRG [Feng et al., 2023] it-eratively retrieves information to identify the correct rea-soning paths, thereby improving task adaptability. ITER-RETGEN [Shao et al., 2023] follows an iterative strat-egy, merging retrieval and generation in a cyclical pro-cess that alternates between “retrieval-enhanced generation” and “generation-enhanced retrieval”. For non-knowledge-intensive (NKI) tasks, PGRA [Guo et al., 2023] proposes a two-stage framework, starting with a task-agnostic retriever followed by a prompt-guided reranker to select and priori-tize evidence. In contrast, IRCOT [Trivedi et al., 2022] com-bines RAG with Chain of Thought (CoT) methodologies, al-ternating CoT-guided retrievals with retrieval-informed CoT processes, significantly boosting GPT-3’s performance across various question-answering tasks.

在推理过程中，RAG过程的进一步细化可以在满足需要多步骤推理的任务的方法中看到。

>> ITRG [Feng et .， 2023]通过迭代检索信息来识别正确的推理路径，从而提高任务适应性。

>> ITER-RETGEN [Shao et al.， 2023]采用迭代策略，在“检索增强生成”和“生成增强检索”之间交替的循环过程中合并检索和生成。对于非知识密集型(NKI)任务，

>> PGRA [Guo等人，2023]提出了一个两阶段框架，首先是任务无关的检索器，然后是提示引导的重新排序器，以选择和优先排序证据。相比之下，

>> IRCOT [Trivedi等人，2022]将RAG与思维链(CoT)方法相结合，将思维链引导的检索与检索通知的CoT过程相结合，显著提高了GPT-3在各种问答任务中的表现。

优缺点：轻量级的、经济有效的、无需再训练，但需要仔细的数据处理和优化

In essence, these inference-stage enhancements provide lightweight, cost-effective alternatives that leverage the ca-pabilities of pre-trained models without necessitating further training. The principal advantage is maintaining static LLM parameters while supplying contextually relevant information to meet specific task demands. Nevertheless, this approach is not without limitations, as it requires meticulous data pro-cessing and optimization, and is bound by the foundational model’s intrinsic capabilities. To address diverse task require-ments effectively, this method is often paired with procedural optimization techniques such as step-wise reasoning, iterative retrieval, and adaptive retrieval strategies.

从本质上讲，这些推理阶段的增强提供了轻量级的、经济有效的替代方案，充分利用了预训练模型的能力，而无需进行进一步的训练。其主要优点是在提供上下文相关信息以满足特定任务需求的同时维护静态LLM参数。然而，这种方法并非没有局限性，因为它需要仔细的数据处理和优化，并受制于基础模型的内在能力。为了有效地满足多样化的任务需求，这种方法通常与步骤式推理、迭代检索和自适应检索策略等过程优化技术一起使用。

Figure 4: Taxonomy of RAG’s core components

6.2 Augmentation Source增强数据来源

RAG模型在数据源选择上不断探索，从早期主要依靠非结构化文本数据，扩展至结构化知识图及LLMs自身生成内容。不同类型数据经过专门处理，为模型提供各层面知识支持，从而提升任务效能。数据源利用是RAG研究一个重要方面。

The effectiveness of RAG models is heavily impacted by the selection of data sources for augmentation. Different levels of knowledge and dimensions require distinct processing tech-niques. They are categorized as unstructured data, structured data, and content generated by LLMs. The technology tree of representative RAG research with different augmentation aspects is depicted in Figure 5. The leaves, colored in three different shades, represent enhancements using various types of data: unstructured data, structured data, and content gener-ated by LLMs. The diagram clearly shows that initially, aug-mentation was mainly achieved through unstructured data, such as pure text. This approach later expanded to include the use of structured data (e.g. knowledge graph) for further improvement. More recently, there has been a growing trend in research that utilizes content generated by the LLMs them-selves for retrieval and augmentation purposes.

扩充数据源的选择严重影响RAG模型的有效性。不同的知识层次和维度需要不同的处理技术。它们分为非结构化数据、结构化数据和LLMs生成的内容。具有代表性的不同增强方面的RAG研究技术树如图5所示。叶子以三种不同的深浅颜色表示使用不同类型数据的增强:非结构化数据、结构化数据和LLM生成的内容。图表清楚地表明，最初，增强主要是通过非结构化数据，如纯文本来实现的。这种方法后来扩展到包括使用结构化数据(例如知识图)以进一步改进。最近，在研究中有一种日益增长的趋势，即利用LLM本身生成的内容进行检索和增强。

扩充非结构化数据：采用语料库如文本等进行增强，提取单元从词到短语、段落不等，比如FLARE、RETRO

Augmented with Unstructured Data

Unstructured text, is gathered from corpora, such as prompt data for fine-tuning large models [Cheng et al., 2023a] and cross-lingual data [Li et al., 2023b]. Retrieval units vary from tokens (e.g., kNN-LM [Khandelwal et al., 2019]) to phrases (e.g., NPM, COG [Lee et al., 2020, Lan et al., 2022]) and document paragraphs, with finer granularities offering pre-cision at the cost of increased retrieval complexity.

扩充非结构化数据

非结构化文本从语料库中收集，例如用于微调大型模型的提示数据[Cheng等人，2023a]和跨语言数据[Li等人，2023b]。检索单元从令牌(例如kNN-LM [Khandelwal等人，2019])到短语(例如NPM, COG [Lee等人，2020,Lan等人，2022])和文档段落不等，更细的粒度以增加检索复杂性为代价提供了精确的决策。

FLARE [Jiang et al., 2023b] introduces an active re-trieval approach, triggered by the LM’s generation of low-probability words. It creates a temporary sentence for doc-ument retrieval, then regenerates the sentence with the re-trieved context to predict subsequent sentences. RETRO uses the previous chunk to retrieve the nearest neighbor at the chunk level, combined with the previous chunk’s context, it guides the generation of the next chunk. To preserve causal-ity, the generation of the next block Ci only utilizes the near-est neighbor of the previous block N(Ci−1) and not N(Ci).

>> FLARE [Jiang等人，2023b]引入了一种主动重新检索方法，该方法由LM生成的低概率词触发。它为文档检索创建一个临时句子，然后使用检索到的上下文重新生成该句子，以预测后续的句子。

>> RETRO使用前一个块来检索块级别上最近的邻居，结合前一个块的上下文，它指导下一个块的生成。为了保持因果关系，下一个块Ci的生成只利用前一个块的最近邻居N(Ci−1)而不是N(Ci)。

增强结构化数据：比如RET-LLMs(使用知识图谱提供高质量上下文)、SUGRE(多模态对比学习)、KnowledGPT(将知识存储在个性化库)

Augmented with Structured Data

Structured data, such as knowledge graphs (KGs), pro-vide high-quality context and mitigate model hallucina-tions. RET-LLMs [Modarressi et al., 2023] constructs a knowledge graph memory from past dialogues for future ref-erence. SUGRE [Kang et al., 2023] employs Graph Neu-ral Networks (GNNs) to encode relevant KG subgraphs, ensuring consistency between retrieved facts and gener-ated text through multi-modal contrastive learning. KnowledGPT [Wang et al., 2023d] generates KB search queries and stores knowledge in a personalized base, enhancing the RAG model’s knowledge richness and contextuality.

增强结构化数据

结构化数据，如知识图(KGs)，提供了高质量的背景，减轻了模型幻觉。

>> RET-LLMs [Modarressi et al.， 2023]从过去的对话中构建了一个知识图记忆，以供将来参考。

>> SUGRE [Kang et al.， 2023]使用图神经网络(Graph neural Networks, GNN)对相关KG子图进行编码，通过多模态对比学习确保检索事实与生成文本之间的一致性。

>> KnowledGPT [Wang et al.， 2023]生成知识库搜索查询，并将知识存储在个性化库中，增强了RAG模型的知识丰富度和上下文性。

LLMs自身生成内容增强：利用LLMs内部知识避免外部信息限制，选择性应用检索增强，替换检索器使用LLMs生成更匹配预训练目标的上下文

LLMs-Generated Content in RAG

Addressing the limitations of external auxiliary information in RAG, some research has focused on exploiting LLMs’ in-ternal knowledge. SKR [Wang et al., 2023e] classifies ques-tions as known or unknown, applying retrieval enhancement selectively. GenRead [Yu et al., 2022] replaces the retriever with an LLM generator, finding that LLM-generated con-texts often contain more accurate answers due to better align-ment with the pre-training objectives of causal language mod-eling. Selfmem [Cheng et al., 2023b] iteratively creates an unbounded memory pool with a retrieval-enhanced genera-tor, using a memory selector to choose outputs that serve as dual problems to the original question, thus self-enhancing the generative model.

LLMs生成的内容在RAG

针对外部辅助信息在RAG中的局限性，一些研究侧重于利用LLMs的内部知识。

>> SKR [Wang et al.， 2023e]将问题分类为已知或未知，有选择地应用检索增强。

>> GenRead [Yu et al.， 2022]用LLM生成器取代了检索器，发现LLM生成的上下文通常包含更准确的答案，因为它与因果语言建模的预训练目标更一致。

>> Selfmem [Cheng et al.， 2023b]使用检索增强的genera-tor迭代创建无界内存池，使用内存选择器选择作为原始问题的双重问题的输出，从而自我增强生成模型。

These methodologies underscore the breadth of innovative data source utilization in RAG, striving to improve model per-formance and task effectiveness.

这些方法强调了RAG中创新数据源利用的广度，努力提高模型性能和任务有效性。

Figure 5: Technology tree of representative RAG research with different augmentation aspects

6.3 Augmentation Process增强过程

典型方法:

>> Flare监测生成概率主动检索

>> Self-RAG启用“检索”“评估”标记主动调整

总体来说，迭代检索提供更广知识支持；递归检索 deeper mining;自适应检索让LLMs主导过程。这些方法共同克服单次检索不足，优化RAG流程。

In the domain of RAG, the standard practice often involves a singular retrieval step followed by generation, which can lead to inefficiencies. A notable issue, termed the “lost in the middle” phenomenon, arises when a single retrieval yields redundant content that may dilute or contradict es-sential information, thereby degrading the generation qual-ity [Liu et al., 2023a]. Furthermore, such singular retrieval is typically insufficient for complex problems demanding multi-step reasoning, as it provides a limited scope of informa-tion [Yoran et al., 2023].

在RAG领域中，标准实践通常涉及单个检索步骤，然后是生成，这可能导致效率低下。一个值得注意的问题被称为“中间丢失”现象，当单个检索产生冗余内容时，可能会稀释或矛盾基本信息，从而降低生成质量[Liu et al.， 2023a]。此外，这种奇异检索通常不足以解决需要多步推理的复杂问题，因为它提供的信息范围有限[Yoran等人，2023]。

As illustrated in Figure 5, to circumvent these challenges, contemporary research has proposed methods for refining the retrieval process: iterative retrieval, recursive retrieval and adaptive retrieval. Iterative retrieval allows the model to en-gage in multiple retrieval cycles, enhancing the depth and relevance of the information obtained. Recursive retrieval process where the results of one retrieval operation are used as the input for the subsequent retrieval. It helps to delve deeper into relevant information, particularly when dealing with complex or multi-step queries. Recursive retrieval is of-ten used in scenarios where a gradual approach is needed to converge on a final answer, such as in academic research, le-gal case analysis, or certain types of data mining tasks. Adap-tive retrieval, on the other hand, offers a dynamic adjustment mechanism, tailoring the retrieval process to the specific de-mands of varying tasks and contexts.

如图5所示，为了规避这些挑战，当代研究提出了改进检索过程的方法:迭代检索、递归检索和自适应检索。迭代检索允许模型参与多个检索周期，增强所获得信息的深度和相关性。递归检索过程，其中一次检索操作的结果用作后续检索的输入。它有助于深入研究相关信息，特别是在处理复杂或多步骤查询时。递归检索通常用于需要逐步收敛于最终答案的场景，例如在学术研究、法律案例分析或某些类型的数据挖掘任务中。另一方面，自适应检索提供了一种动态调整机制，使检索过程适应不同任务和上下文的具体要求。

迭代的检索：重复收集文档提供更丰富知识支持，可能出现语义塌陷和无关信息累积

Iterative Retrieval

Iterative retrieval in RAG models is a process where doc-uments are repeatedly collected based on the initial query and the text generated thus far, providing a more compre-hensive knowledge base for LLMs [Borgeaud et al., 2022, Arora et al., 2023]. This approach has been shown to en-hance the robustness of subsequent answer generation by of-fering additional contextual references through multiple re-trieval iterations. However, it may suffer from semantic dis-continuity and the accumulation of irrelevant information, as it typically relies on a sequence of n tokens to demarcate the boundaries between generated text and retrieved documents.

迭代的检索

RAG模型中的迭代检索是基于初始查询和迄今为止生成的文本重复收集文档的过程，为LLMs提供了更全面的知识库[Borgeaud等人，2022,Arora等人，2023]。这种方法已被证明可以通过多次重新检索迭代提供额外的上下文引用来增强后续答案生成的鲁棒性。然而，它可能会受到语义不连续性和不相关信息积累的影响，因为它通常依赖于n个令牌序列来划定生成文本和检索文档之间的边界。

To address specific data scenarios, recursive retrieval and multi-hop retrieval techniques are utilized. Recursive re-trieval involves a structured index to process and retrieve data in a hierarchical manner, which may include summa-rizing sections of a document or lengthy PDF before per-forming a retrieval based on this summary. Subsequently, a secondary retrieval within the document refines the search, embodying the recursive nature of the process. In contrast, multi-hop retrieval is designed to delve deeper into graph-structured data sources, extracting interconnected informa-tion [Li et al., 2023c].

为了解决特定的数据场景，使用了递归检索和多跳检索技术。递归重新检索涉及到以分层方式处理和检索数据的结构化索引，其中可能包括在基于该摘要执行检索之前对文档或冗长PDF的各个部分进行汇总。随后，文档中的二次检索细化了搜索，体现了该过程的递归性质。相比之下，多跳检索旨在更深入地挖掘图结构数据源，提取相互关联的信息[Li et al.， 2023c]。

Additionally, some methodologies integrate the steps of re-trieval and generation. ITER-RETGEN [Shao et al., 2023] employs a synergistic approach that leverages “retrieval-enhanced generation” alongside “generation-enhanced re-trieval” for tasks that necessitate the reproduction of specific information. The model harnesses the content required to ad-dress the input task as a contextual basis for retrieving per-tinent knowledge, which in turn facilitates the generation of improved responses in subsequent iterations.

此外，一些方法集成了重新检索和生成的步骤。

>> ITER-RETGEN [Shao等人，2023]采用了一种协同方法，在需要复制特定信息的任务中，利用“检索增强生成”和“生成增强检索”。该模型利用处理输入任务所需的内容作为检索各大洲知识的上下文基础，这反过来又促进了在随后的迭代中生成改进的响应。

递归检索：层层细化查询获得更深入信息，常用于复杂多步任务

Recursive Retrieval

Recursive Retrieval is often used in information retrieval and NLP to improve the depth and relevance of search results. The process involves iteratively refining search queries based on the results obtained from previous searches. Recursive Retrieval aims to enhance the search experience by gradu-ally converging on the most pertinent information through a feedback loop. IRCoT [Trivedi et al., 2022] uses chain-of-thought to guide the retrieval process and refines the CoT with the obtained retrieval results. ToC [Kim et al., 2023] creates a clarification tree that systematically optimizes the ambiguous parts in the Query. It can be particularly useful in complex search scenarios where the user’s needs are not en-tirely clear from the outset or where the information sought is highly specialized or nuanced. The recursive nature of the process allows for continuous learning and adaptation to the user’s requirements, often resulting in improved satisfaction with the search outcomes.

递归检索

递归检索常用于信息检索和自然语言处理，以提高搜索结果的深度和相关性。该过程涉及基于从以前的搜索中获得的结果迭代地改进搜索查询。递归检索旨在通过反馈循环逐步收敛到最相关的信息，从而增强搜索体验。

>> IRCoT [Trivedi et al.， 2022]使用思维链(chain-of-thought)来指导检索过程，并利用获得的检索结果对CoT进行细化。

>> ToC [Kim等人，2023]创建了一个澄清树，系统地优化查询中的模糊部分。在复杂的搜索场景中，如果用户的需求从一开始就不完全清楚，或者所搜索的信息非常专门化或微妙，那么它特别有用。该过程的递归性质允许不断学习和适应用户的需求，通常会提高对搜索结果的满意度。

自适应的检索：LLMs主动判断检索时机和内容，优化检索效率和相关性

Adaptive Retrieval Adaptive retrieval methods, exemplified by Flare and Self-RAG [Jiang et al., 2023b, Asai et al., 2023], refine the RAG framework by enabling LLMs to actively determine the op-timal moments and content for retrieval, thus enhancing the efficiency and relevance of the information sourced.	自适应的检索自适应检索方法，如Flare和Self-RAG [Jiang等人，2023b, Asai等人，2023]，通过使LLM能够主动确定检索的最优时刻和内容，从而提高信息源的效率和相关性，从而完善了RAG框架。
These methods are part of a broader trend wherein LLMs employ active judgment in their operations, as seen in model agents like AutoGPT, Toolformer, and Graph-Toolformer [Yang et al., 2023c, Schick et al., 2023,Zhang, 2023]. Graph-Toolformer, for instance, divides its re-trieval process into distinct steps where LLMs proactively use retrievers, apply Self-Ask techniques, and employ few-shot prompts to initiate search queries. This proactive stance al-lows LLMs to decide when to search for necessary informa-tion, akin to how an agent utilizes tools.	这些方法是LLM在其操作中采用主动判断的更广泛趋势的一部分，正如在AutoGPT, Toolformer和Graph-Toolformer等模型代理中所看到的那样[Yang等人，2023c, Schick等人，2023,Zhang, 2023]。例如，Graph-Toolformer将其检索过程划分为不同的步骤，其中LLM主动使用检索器，应用Self-Ask技术，并使用少量提示来启动搜索查询。这种主动的姿态允许LLM决定何时搜索必要的信息，类似于代理如何利用工具。
WebGPT [Nakano et al., 2021] integrates a reinforcement learning framework to train the GPT-3 model in au-tonomously using a search engine during text generation. It navigates this process using special tokens that facili-tate actions such as search engine queries, browsing results, and citing references, thereby expanding GPT-3’s capabilities through the use of external search engines.	>> WebGPT [Nakano等人，2021]集成了一个强化学习框架，在文本生成过程中使用搜索引擎自主训练GPT-3模型。它使用特殊的令牌来导航这个过程，这些令牌促进了诸如搜索引擎查询、浏览结果和引用引用等操作，从而通过使用外部搜索引擎扩展了GPT-3的功能。
Flare automates timing retrieval by monitoring the confi-dence of the generation process, as indicated by the probabil-ity of generated terms [Jiang et al., 2023b]. When the prob-ability falls below a certain threshold would activates the re-trieval system to collect relevant information, thus optimizing the retrieval cycle.	耀斑通过监测生成过程的置信度来自动获取时间，如生成项的概率所示[Jiang等，2023b]。当概率低于一定阈值时，将激活检索系统收集相关信息，从而优化检索周期。
Self-RAG [Asai et al., 2023] introduces “reflection to-kens” that allow the model to introspect its outputs. These tokens come in two varieties: “retrieve” and “critic”. The model autonomously decides when to activate retrieval, or alternatively, a predefined threshold may trigger the pro-cess. During retrieval, the generator conducts a fragment-level beam search across multiple paragraphs to derive the most coherent sequence. Critic scores are used to update the subdivision scores, with the flexibility to adjust these weights during inference, tailoring the model’s behavior. Self-RAG’s design obviates the need for additional classifiers or reliance on Natural Language Inference (NLI) models, thus stream-lining the decision-making process for when to engage re-trieval mechanisms and improving the model’s autonomous judgment capabilities in generating accurate responses.	>> Self-RAG [Asai等人，2023]引入了“反射因子”，允许模型自省其输出。这些标记有两种:“检索”和“批评”。模型自主地决定何时激活检索，或者，预定义的阈值可能触发该流程。在检索过程中，生成器跨多个段落进行片段级波束搜索，以获得最连贯的序列。评论家分数用于更新细分分数，在推理过程中可以灵活地调整这些权重，从而调整模型的行为。Self-RAG的设计不需要额外的分类器或依赖于自然语言推理(NLI)模型，从而简化了何时使用重新检索机制的决策过程，并提高了模型在生成准确响应方面的自主判断能力。
LLM optimization has received significant attention due to its increasing prevalence. Techniques such as prompt engi-neering, Fine-Tuning (FT), and RAG each have distinct char-acteristics, visually represented in Figure 6. While prompt engineering leverages a model’s inherent capabilities, opti-mizing LLMs often requires the application of both RAG and FT methods. The choice between RAG and FT should be based on the specific requirements of the scenario and the in-herent properties of each approach. A detailed comparison of RAG and FT is presented in Table 1.	LLM优化由于其日益普及而受到了极大的关注。诸如提示工程、微调(FT)和RAG等技术各有不同的特征，如图6所示。虽然快速工程利用了模型的固有功能，但优化LLM通常需要同时应用RAG和FT方法。RAG和FT之间的选择应该基于场景的特定需求和每种方法的固有属性。表1给出了RAG和FT的详细比较。

6.4 RAG vs Fine-Tuning对比：两者可以互补

RAG更适合特定的查询，整合新知识、快速迭代新用例

RAG is like giving a model a textbook for tailored informa-tion retrieval, perfect for specific queries. On the other hand, FT is like a student internalizing knowledge over time, bet-ter for replicating specific structures, styles, or formats. FT can improve model performance and efficiency by reinforc-ing base model knowledge, adjusting outputs, and teaching complex instructions. However, it is not as good for integrat-ing new knowledge or rapidly iterating new use cases.

RAG就像给模型提供了一本教科书，用于定制信息检索，非常适合特定的查询。另一方面，FT就像一个学生，随着时间的推移将知识内化，更适合复制特定的结构、风格或格式。FT可以通过强化基础模型知识、调整输出和教授复杂指令来提高模型性能和效率。然而，它在整合新知识或快速迭代新用例方面并不如RAG好。

The two methods, RAG and FT, are not mutually exclusive and can be complementary, augmenting a model’s capabil-ities at different levels. In some cases, their combined use may yield optimal performance. The optimization process involving RAG and FT can necessitate multiple iterations to achieve satisfactory results.

这两种方法，RAG和FT，并不是相互排斥的，而是可以互补的，可以在不同层次上增强模型的能力。在某些情况下，它们的组合使用可能产生最佳性能。涉及RAG和FT的优化过程可能需要多次迭代才能获得满意的结果。

Table 1: Comparison between RAG and Fine-Tuning

7 RAG Evaluation

总体来说，RAG模型评估注重检索和生成两个主要模块，调查质量和能力多方面，为深入掌握模型性能提供全面框架。此外，基准和自动工具给出定量metric，助推评估进一步优化。RAG模型评估是当前研究热点，未来可以探讨更定制化的metric和标准化框架。

The rapid advancement and growing adoption of RAG in the field of Natural Language Processing (NLP) have propelled the evaluation of RAG models to the forefront of research in the LLMs community. The primary objective of this evalua-tion is to comprehend and optimize the performance of RAG models across diverse application scenarios.	RAG在自然语言处理(NLP)领域的快速发展和越来越多的采用，将RAG模型的评估推向了LLMs社区研究的前沿。该评估的主要目标是理解和优化RAG模型跨不同应用程序场景的性能。
Historically, RAG models assessments have centered on their execution in specific downstream tasks. These evaluations employ established metrics suitable to the tasks at hand. For instance, question answering evaluations might rely on EM and F1 scores [Wang et al., 2023a, Shi et al., 2023, Feng et al., 2023, Ma et al., 2023a], whereas fact-checking tasks often hinge on accuracy as the pri-mary metric [Lewis et al., 2020, Izacard et al., 2022, Shao et al., 2023]. Tools like RALLE, designed for the auto-matic evaluation of RAG applications, similarly base their as-sessments on these task-specific metrics [Hoshi et al., 2023]. Despite this, there is a notable paucity of research dedicated to evaluating the distinct characteristics of RAG models, with only a handful of related studies.	从历史上看，RAG模型评估集中在它们在特定下游任务中的执行。这些评估采用适合手头任务的既定指标。例如，问答评估可能依赖于EM和F1分数[Wang等人，2023a, Shi等人，2023,Feng等人，2023,Ma等人，2023a]，而事实核查任务通常依赖于准确性作为主要指标[Lewis等人，2020,Izacard等人，2022,Shao等人，2023]。为RAG应用程序的自动评估而设计的工具，如RALLE，同样基于这些特定于任务的指标进行评估[Hoshi等人，2023]。尽管如此，致力于评估RAG模型独特特征的研究明显缺乏，只有少数相关研究。
The following section shifts the focus from task-specific evaluation methods and metrics to provide a synthesis of the existing literature based on their unique attributes. This ex-ploration covers the objectives of RAG evaluation, the aspects along which these models are assessed, and the benchmarks and tools available for such evaluations. The aim is to offer a comprehensive overview of RAG model evaluation, outlining the methodologies that specifically address the unique aspects of these advanced generative systems.	以下部分将重点从特定于任务的评估方法和度量转移到基于其独特属性的现有文献的综合。本文探讨了RAG评估的目标、评估这些模型的各个方面，以及可用于此类评估的基准和工具。目的是提供RAG模型评估的全面概述，概述了具体解决这些先进生成系统独特方面的方法。

7.1 Evaluation Targets评估目标

The assessment of RAG models mainly revolves around two key components: the retrieval and generation modules. This division ensures a thorough evaluation of both the quality of context provided and the quality of content produced.

RAG模型的评估主要围绕两个关键组件进行:检索和生成模块。这种划分确保了对所提供的上下文质量和所产生的内容质量的全面评估。

检索的质量：命中率、MRR等

Retrieval Quality

Evaluating the retrieval quality is crucial for determining the effectiveness of the context sourced by the retriever com-ponent. Standard metrics from the domains of search en-gines, recommendation systems, and information retrieval systems are employed to measure the performance of the RAG retrieval module. Metrics such as Hit Rate, MRR, and NDCG are commonly utilized for this purpose [Liu, 2023, Nguyen, 2023].

检索的质量

评估检索质量对于确定检索器组件来源的上下文的有效性至关重要。使用来自搜索引擎、推荐系统和信息检索系统领域的标准度量来度量RAG检索模块的性能。命中率、MRR和NDCG等指标通常用于此目的[Liu, 2023, Nguyen, 2023]。

生成质量：未标签的生成质量和未标签的答案准确率

Generation Quality

The assessment of generation quality centers on the gener-ator’s capacity to synthesize coherent and relevant answers from the retrieved context. This evaluation can be catego-rized based on the content’s objectives: unlabeled and la-beled content. For unlabeled content, the evaluation encom-passes the faithfulness, relevance, and non-harmfulness of the generated answers. In contrast, for labeled content, the fo-cus is on the accuracy of the information produced by the model [Liu, 2023]. Additionally, both retrieval and genera-tion quality assessments can be conducted through manual or automatic evaluation methods [Liu, 2023, Lan et al., 2022, Leng et al., 2023].

生成质量

发电质量的评估集中在发电机从检索上下文合成连贯和相关答案的能力上。这种评估可以根据内容的目标进行分类:未标记和标记的内容。对于未标记的内容，评估包括生成答案的可靠性、相关性和非危害性。相比之下，对于标记的内容，重点是模型产生的信息的准确性[Liu, 2023]。此外，检索和生成质量评估都可以通过手动或自动评估方法进行[Liu, 2023, Lan等，2022,Leng等，2023]。

Figure 6: RAG compared with other model optimization methods

7.2 Evaluation Aspects评估内容

Contemporary evaluation practices of RAG models empha-size three primary quality scores and four essential abilities, which collectively inform the evaluation of the two principal targets of the RAG model: retrieval and generation.

当代RAG模型的评估实践强调三个主要质量分数和四个基本能力，它们共同通知了RAG模型的两个主要目标的评估:检索和生成。

质量分数：上下文相关性、答案逼真性、答案相关性

Quality Scores Quality scores include context relevance, answer faith-fulness, and answer relevance. These quality scores evaluate the efficiency of the RAG model from differ-ent perspectives in the process of information retrieval and generation [Es et al., 2023, Saad-Falcon et al., 2023, Jarvis and Allard, 2023]. The quality scores—context rele-vance, answer faithfulness, and answer relevance—assess the RAG model’s efficiency from various angles throughout the information retrieval and generation process [Es et al., 2023, Saad-Falcon et al., 2023, Jarvis and Allard, 2023].	质量分数质量分数包括上下文相关性、答案真实性和答案相关性。这些质量分数从不同角度评价RAG模型在信息检索和生成过程中的效率[Es et al.， 2023; Saad-Falcon et al.， 2023; Jarvis and Allard, 2023]。质量分数——上下文相关性、答案忠实度和答案相关性——在整个信息检索和生成过程中从不同角度评估RAG模型的效率[Es等人，2023;Saad-Falcon等人，2023;Jarvis和Allard, 2023]。
Context Relevance evaluates the precision and specificity of the retrieved context, ensuring relevance and minimizing processing costs associated with extraneous content.	上下文相关性评估检索上下文的准确性和特异性，确保相关性并最大限度地减少与无关内容相关的处理成本。
Answer Faithfulness ensures that the generated answers re-main true to the retrieved context, maintaining consistency and avoiding contradictions.	答案忠实确保生成的答案与检索的上下文保持一致，保持一致性并避免矛盾。
Answer Relevance requires that the generated answers are directly pertinent to the posed questions, effectively address-ing the core inquiry.	答案相关性要求生成的答案与提出的问题直接相关，有效地解决核心问题。

所需的能力：噪声鲁棒性、负面拒绝、信息整合、反事实鲁棒性

Required Abilities RAG evaluation also encompasses four abilities indicative of its adaptability and efficiency: noise robustness, negative re-jection, information integration, and counterfactual robust-ness [Chen et al., 2023b, Liu et al., 2023b]. These abilities are critical for the model’s performance under various chal-lenges and complex scenarios, impacting the quality scores.	所需的能力 RAG评估还包括表明其适应性和效率的四种能力:噪声鲁棒性、负面拒绝、信息整合和反事实鲁棒性[Chen et al.， 2023b, Liu et al.， 2023b]。这些能力对于模型在各种挑战和复杂场景下的性能至关重要，影响质量分数。
Noise Robustness appraises the model’s capability to man-age noise documents that are question-related but lack sub-stantive information.	噪声鲁棒性评价模型管理与问题相关但缺乏实质性信息的噪声文件的能力。
Negative Rejection assesses the model’s discernment in re-fraining from responding when the retrieved documents do not contain the necessary knowledge to answer a question.	当检索到的文档不包含回答问题所需的知识时，负面拒绝评估模型在重新训练时的识别能力。
Information Integration evaluates the model’s proficiency in synthesizing information from multiple documents to ad-dress complex questions.	信息集成评估模型从多个文档中综合信息以解决复杂问题的熟练程度。
Counterfactual Robustness tests the model’s ability to rec-ognize and disregard known inaccuracies within documents, even when instructed about potential misinformation.	反事实鲁棒性测试模型识别和忽略文档中已知不准确的能力，即使在被告知可能存在错误信息的情况下也是如此。
Context relevance and noise robustness are important for evaluating the quality of retrieval, while answer faithfulness, answer relevance, negative rejection, information integration, and counterfactual robustness are important for evaluating the quality of generation.	上下文相关性和噪声鲁棒性对于评估检索质量很重要，而答案忠实度、答案相关性、负面拒绝、信息整合和反事实鲁棒性对于评估生成质量很重要。
The specific metrics for each evaluation aspect are summa-rized in Table 2. It is essential to recognize that these metrics, derived from related work, are traditional measures and do not yet represent a mature or standardized approach for quan-tifying RAG evaluation aspects. Custom metrics tailored to the nuances of RAG models, though not included here, have also been developed in some evaluation studies.	表2总结了每个评估方面的具体指标。必须认识到，这些源自相关工作的度量标准是传统的度量标准，尚未代表对RAG评价方面进行量化的成熟或标准化的方法。针对RAG模型的细微差别量身定制的度量标准，虽然没有包括在这里，但也在一些评估研究中得到了开发。

7.3 Evaluation Benchmarks and Tools

评估基准和工具：基准测试量化模型展示能力，自动评估工具计算质量得分

重要基准和工具：RGB、RECALL，RAGAS、ARES、TruLens

This section delineates the evaluation framework for RAG models, comprising benchmark tests and automated eval-uation tools. These instruments furnish quantitative met-rics that not only gauge RAG model performance but also enhance comprehension of the model’s capabilities across various evaluation aspects. Prominent benchmarks such as RGB and RECALL [Chen et al., 2023b, Liu et al., 2023b] focus on appraising the essential abilities of RAG mod-els. Concurrently, state-of-the-art automated tools like RA-GAS [Es et al., 2023], ARES [Saad-Falcon et al., 2023], and TruLens8 employ LLMs to adjudicate the quality scores. These tools and benchmarks collectively form a robust frame-work for the systematic evaluation of RAG models, as sum-marized in Table 3.

本节描述RAG模型的评估框架，包括基准测试和自动评估工具。这些工具提供了定量的度量标准，不仅衡量RAG模型的性能，而且还增强了对模型跨各种评估方面的能力的理解。突出的基准，如RGB和RECALL [Chen et al.， 2023b, Liu et al.， 2023b]侧重于评估RAG模型的基本能力。同时，最先进的自动化工具，如RA-GAS [Es等人，2023]，ARES [Saad-Falcon等人，2023]和TruLens8使用LLMs来评判质量分数。这些工具和基准共同构成了一个健壮的框架，用于对RAG模型进行系统评估，如表3所示。

Table 2: Summary of metrics applicable for evaluation aspects of RAG

Table 3: Summary of evaluation frameworks

8 Future Prospects

This section explores three future prospects for RAG: future challenges, modality expansion, and the RAG ecosystem.

本节探讨了RAG的三个未来展望：未来挑战、模态扩展和RAG生态系统。

8.1 Future Challenges of RAG未来挑战

Despite the considerable progress in RAG technology, several challenges persist that warrant in-depth research:

尽管RAG技术取得了长足的进步，但仍存在一些需要深入研究的挑战:

上下文长度(信息不足/信息稀释)、鲁棒性(噪声/矛盾信息)等技术难点：有待提升

Context Length. RAG’s efficacy is limited by the context window size of Large Language Models (LLMs). Balancing the trade-off between a window that is too short, risking insuf-ficient information, and one that is too long, risking informa-tion dilution, is crucial. With ongoing efforts to expand LLM context windows to virtually unlimited sizes, the adaptation of RAG to these changes presents a significant research ques-tion [Xu et al., 2023c, Packer et al., 2023, Xiao et al., 2023].

上下文的长度。RAG的有效性受到大型语言模型(LLM)的上下文窗口大小的限制。在窗口太短(可能导致信息不足)和窗口太长(可能导致信息稀释)之间取得平衡至关重要。随着人们不断努力将LLM上下文窗口扩展到几乎无限的大小，RAG对这些变化的适应提出了一个重要的研究问题[Xu等人，2023c, Packer等人，2023,Xiao等人，2023]。

Robustness. The presence of noise or contradictory infor-mation during retrieval can detrimentally affect RAG’s output quality. This situation is figuratively referred to as “Mis-information can be worse than no information at all”. Im-proving RAG’s resistance to such adversarial or counterfac-tual inputs is gaining research momentum and has become a key performance metric [Yu et al., 2023a, Glass et al., 2021, Baek et al., 2023].

鲁棒性。在检索过程中存在噪声或矛盾信息可能会对RAG的输出质量产生不利影响。这种情况在比喻上被称为“错误信息可能比没有信息更糟糕”。提高RAG对这些敌对或反事实输入的抵抗力正在成为研究的热点，并已成为关键的性能指标[Yu等，2023a，Glass等，2021，Baek等，2023]。

混合方法(RAG+FT)：混合RAG和微调的优化，新兴的策略

Hybrid Approaches (RAG+FT). Combining RAG with fine-tuning is emerging as a leading strategy. Determining the optimal integration of RAG and fine-tuning whether sequen-tial, alternating, or through end-to-end joint training—and how to harness both parameterized and non-parameterized advantages are areas ripe for exploration [Lin et al., 2023].

混合方法（RAG+FT）。将RAG与微调结合起来是一种新兴的策略。确定RAG和微调的最佳集成方式，无论是顺序，交替还是通过端到端联合训练，以及如何利用参数化和非参数化优势都是值得探讨的领域[Lin等，2023]。

扩展LLMs角色

Expanding LLM Roles. Beyond generating final answers, LLMs are leveraged for retrieval and evaluation within RAG frameworks. Identifying ways to further unlock LLMs poten-tial in RAG systems is a growing research direction.

扩展LLM角色。除了生成最终答案之外，LLMs还被用于在RAG框架内进行检索和评估。找到进一步释放LLMs在RAG系统中潜力的方法是一个不断增长的研究方向。

扩展定律

Scaling Laws. While scaling laws [Kaplan et al., 2020] are established for LLMs, their applicability to RAG remains uncertain. Initial studies [Wang et al., 2023b] have begun to ad-dress this, yet the parameter count in RAG models still lags behind that of LLMs. The possibility of an Inverse Scaling Law9, where smaller models outperform larger ones, is par-ticularly intriguing and merits further investigation.

比例法。虽然已经为LLMs建立了扩展定律[Kaplan et al.， 2020]，但它们对RAG的适用性仍然不确定。初步研究[Wang et al.， 2023b]已经开始解决这个问题，但RAG模型的参数计数仍然落后于LLM。逆缩放定律的可能性，即较小的模型优于较大的模型，特别有趣，值得进一步研究。

生产使用的RAG：提高检索效率/召回率、确保数据安全(如防止LLMs无意中泄露文档源或元数据)

Production-Ready RAG. RAG’s practicality and alignment with engineering requirements have facilitated its adoption. However, enhancing retrieval efficiency, improving document recall in large knowledge bases, and ensuring data secu-rity—such as preventing inadvertent disclosure of document sources or metadata by LLMs—are critical engineering chal-lenges that remain to be addressed [Alon et al., 2022].

生产使用的RAG。RAG的实用性和与工程需求的一致性促进了它的采用。然而，提高检索效率，提高大型知识库中的文档召回率，并确保数据安全(如防止LLMs无意中泄露文档源或元数据)是仍有待解决的关键工程挑战[Alon等人，2022]。

RAG的多模态：模态扩展让RAG能适应更广内容，图像(如RA-CM3/BLIP-2等)、音频视频、代码等多模态整合

Modality Extension of RAG RAG has transcended its initial text-based question-answering confines, embracing a diverse array of modal data. This expansion has spawned innovative multimodal models that integrate RAG concepts across various domains:	RAG的模态扩展 RAG已经超越了它最初基于文本的问答限制，包含了多种模态数据。这种扩展产生了创新的多模态模型，将RAG概念集成到各个领域：
Image. RA-CM3 [Yasunaga et al., 2022] stands as a pio-neering multimodal model of both retrieving and generating text and images. BLIP-2 [Li et al., 2023a] leverages frozen image encoders alongside LLMs for efficient visual language pre-training, enabling zero-shot image-to-text conversions. The “Visualize Before You Write” method [Zhu et al., 2022] employs image generation to steer the LM’s text generation, showing promise in open-ended text generation tasks.	图像。RA-CM3 [Yasunaga等人，2022]是一种检索和生成文本和图像的并行多模态模型。BLIP-2 [Li等，2023a]利用冻结图像编码器和LLM进行高效的视觉语言预训练，实现零样本图像到文本的转换。在写之前可视化”方法[Zhu等人，2022]使用图像生成来引导LM的文本生成，在开放式文本生成任务中显示出潜力。
Audio and Video. The GSS method retrieves and stitches together audio clips to convert machine-translated data into speech-translated data [Zhao et al., 2022]. UEOP marks a significant advancement in end-to-end automatic speech recognition by incorporating external, offline strategies for voice-to-text conversion [Chan et al., 2023]. Additionally, KNN-based attention fusion leverages audio embeddings and semantically related text embeddings to refine ASR, thereby accelerating domain adaptation. Vid2Seq augments language models with specialized temporal markers, facilitating the prediction of event boundaries and textual descriptions within a unified output sequence [Yang et al., 2023a].	音频和视频。GSS方法检索并拼接音频片段，将机器翻译数据转换为语音翻译数据[Zhao et al.， 2022]。UEOP通过结合外部离线策略进行语音到文本转换，标志着端到端自动语音识别的重大进步[Chan等人，2023]。此外，基于KNN的注意力融合利用音频嵌入和语义相关的文本嵌入来改进ASR，从而加速领域适应。Vid2Seq用专门的时间标记增强了语言模型，便于在统一的输出序列中预测事件边界和文本描述[Yang等，2023a]。
Code. RBPS [Nashid et al., 2023] excels in small-scale learning tasks by retrieving code examples that align with de-velopers’ objectives through encoding and frequency analy-sis. This approach has demonstrated efficacy in tasks such as test assertion generation and program repair. For structured knowledge, the CoK method [Li et al., 2023c] first extracts facts pertinent to the input query from a knowledge graph, then integrates these facts as hints within the input, enhancing performance in knowledge graph question-answering tasks.	代码。RBPS [Nashid等人，2023]通过编码和频率分析检索与开发人员目标一致的代码示例，在小规模学习任务中表现出色。这种方法在测试断言生成和程序修复等任务中已被证明是有效的。对于结构化知识，CoK方法[Li et al.， 2023c]首先从知识图中提取与输入查询相关的事实，然后将这些事实作为提示集成到输入中，从而提高知识图问答任务的性能。

8.2 Ecosystem of RAG：生态系统需要优化评估和下游应用

下游任务：开放问答、事实检测等

Downstream Tasks and Evaluation

RAG has shown considerable promise in enriching language models with the capacity to handle intricate queries and pro-duce detailed responses by leveraging extensive knowledge bases. Empirical evidence suggests that RAG excels in a variety of downstream tasks, including open-ended question answering and fact verification. The integration of RAG not only bolsters the precision and relevance of responses but also their diversity and depth.

下游任务及评估

RAG在丰富语言模型的能力方面显示出了巨大的潜力，使其能够处理复杂的查询并通过利用广泛的知识库生成详细的响应。经验证据表明，RAG在各种下游任务中表现出色，包括开放式问答和事实验证。整合RAG不仅增强了响应的精度和相关性，还增加了其多样性和深度。

The scalability and versatility of RAG across multiple do-mains warrant further investigation, particularly in special-ized fields such as medicine, law, and education. In these ar-eas, RAG could potentially reduce training costs and enhance performance compared to traditional fine-tuning approaches in professional domain knowledge question answering.

RAG在多个领域的可伸缩性和通用性需要进一步研究，特别是在医学、法律和教育等专业领域。在这些领域，与传统的微调方法相比，RAG可能能够降低培训成本并提高专业领域知识问答的性能。

评估框架：细化metric、解释性

Concurrently, refining the evaluation framework for RAG is essential to maximize its efficacy and utility across different tasks. This entails the development of nuanced metrics and assessment tools that can gauge aspects such as contextual relevance, creativity of content, and non-maleficence.

Furthermore, improving the interpretability of RAG-driven models continues to be a key goal. Doing so would allow users to understand the reasoning behind the responses gener-ated by the model, thereby promoting trust and transparency in the use of RAG applications.

与此同时，改进RAG的评估框架对于最大化其在不同任务中的效力和效用至关重要。这涉及开发细致入微的度量标准和评估工具，可以衡量诸如上下文相关性、内容的创造性和无恶意性等方面。

此外，提高基于RAG驱动模型的可解释性仍然是一个关键目标。这将使用户能够理解模型生成的响应背后的推理过程，从而促进对RAG应用的信任和透明度。

技术堆栈：LangChain/LLamaIndex等，RAG工具包

Technical Stack The development of the RAG ecosystem is greatly impacted by the progression of its technical stack. Key tools like LangChain and LLamaIndex have quickly gained popularity with the emergence of ChatGPT, providing extensive RAG-related APIs and becoming essential in the realm of LLMs.	技术堆栈 RAG生态系统的发展很大程度上受到其技术堆栈进步的影响。随着ChatGPT的出现，LangChain和LLamaIndex等关键工具迅速流行起来，提供了大量与RAG相关的API，并成为LLM领域必不可少的工具。
Emerging technical stacks, while not as feature-rich as LangChain and LLamaIndex, distinguish themselves with specialized offerings. For instance, Flowise AI10 prioritizes a low-code approach, enabling users to deploy AI applications, including RAG, through a user-friendly drag-and-drop inter-face. Other technologies like HayStack, Meltano11, and Co-here Coral12 are also gaining attention for their unique con-tributions to the field.	新兴的技术栈虽然不像LangChain和LLamaIndex那样功能丰富，但它们以专门的产品脱颖而出。例如，Flowise AI优先考虑低代码方法，使用户能够通过用户友好的拖放界面部署AI应用程序，包括RAG。如HayStack、Meltano和Cohere Coral，也因其对领域的独特贡献而引起关注。
In addition to AI-focused providers, traditional software and cloud service providers are expanding their offerings to include RAG-centric services. Verba13 from Weaviate is de-signed for personal assistant applications, while Amazon’s Kendra14 provides an intelligent enterprise search service, al-lowing users to navigate through various content repositories using built-in connectors. During the evolution of the RAG technology landscape, there has been a clear divergence to-wards different specializations, such as: 1) Customization. Tailoring RAG to meet a specific requirements. 2) Simpli-fication. Making RAG easier to use, thereby reducing the ini-tial learning curve. 3) Specialization. Refining RAG to serve production environments more effectively.	除了面向AI的提供商之外，传统的软件和云服务提供商正在扩展其服务，包括RAG-centric服务在内。Weaviate的Verba专为个人助手应用程序设计，而Amazon的Kendra提供智能企业搜索服务，允许用户通过内置连接器浏览各种内容存储库。在RAG技术景观的演变过程中，已经明显出现了朝不同专业化方向的分歧，例如： 1）定制。定制RAG以满足特定要求。 2）简化。使RAG更易于使用，从而减少初始学习曲线。 3）专业化。优化RAG以更有效地服务生产环境。
The mutual growth of RAG models and their technical stack is evident; technological advancements consistently es-tablish new standards for the existing infrastructure. In turn, enhancements to the technical stack drive the evolution of RAG capabilities. The RAG toolkit is converging into a foun-dational technical stack, laying the groundwork for advanced enterprise applications. However, the concept of a fully in-tegrated, comprehensive platform remains on the horizon, pending further innovation and development.	RAG模型和其技术堆栈的相互增长是明显的；技术进步不断为现有基础设施建立新标准。反过来，对技术堆栈的增强推动了RAG能力的演变。RAG工具包正在融合为一个基础技术堆栈，为高级企业应用奠定基础。然而，一个完全集成、全面的平台概念仍然在地平线上，尚待进一步的创新和发展。

Figure 7: Summary of RAG ecosystem

9 Conclusion

发展三阶段、微调和增强学习拓宽了其应用范围、多模态、生态系统发展

>> RAG技术架构发展三阶段：基础RAG、高级RAG、模块化RAG。

>> 高级RAG通过查询重写、块重排、提示概述等改进了基础RAG。

>> RAG与微调、增强学习等技术的整合拓宽了其应用范围。

>> 检索采用结构和非结构两种源整合提供更丰富支持。

>> RAG突破语言，扩展到多模态如图像、视频处理。

>> RAG生态系统发展，应用增多，但评估需跟进其演进。

>> RAG技术提升任务效能，但鲁棒性和长文本支持待优化。

>> RAG面临 context长度、鲁棒性挑战，多模态整合也需深入。

>> RAG将成重要技术，但生态建设和标准评估框架仍需完善。

总体而言，RAG技术不断深入，应用范围扩大，但面临诸多挑战和机会待挖掘。RAG是当前AI研究的热点方向。

The summary of this paper, as depicted in Figure 7, high-lights RAG’s significant advancement in enhancing the ca-pabilities of LLMs through the integration of parameter-ized knowledge from language models with extensive non-parameterized data from external knowledge bases. Our sur-vey illustrates the evolution of RAG technologies and their impact on knowledge-intensive tasks. Our analysis delin-eates three developmental paradigms within the RAG frame-work: Naive, Advanced, and Modular RAG, each marking a progressive enhancement over its predecessors. The Ad-vanced RAG paradigm extends beyond the Naive approach by incorporating sophisticated architectural elements, includ-ing query rewriting, chunk reranking, and prompt summariza-tion. These innovations have led to a more nuanced and mod-ular architecture that enhances both the performance and the interpretability of LLMs. RAG’s technical integration with other AI methodologies, such as fine-tuning and reinforce-ment learning, has further expanded its capabilities. In con-tent retrieval, a hybrid methodology that leverages both struc-tured and unstructured data sources is emerging as a trend, providing a more enriched retrieval process. Cutting-edge re-search within the RAG framework is exploring novel con-cepts such as self-retrieval from LLMs and the dynamic tim-ing of information retrieval.

这篇论文的摘要如图7所示，突显了RAG通过将参数化知识与来自外部知识库的广泛非参数化数据相结合，在增强LLM的计算能力方面取得的重大进展。

我们的调查展示了RAG技术的演变及其对知识密集型任务的影响。我们的分析描绘了RAG框架内的三个发展范式：初级的、高级和模块化RAG，每一种都标志着对其前身的逐步增强。

高级RAG范例通过通过整合复杂的架构元素，包括查询重写、块重新排名和提示摘要，超越了初级方法。这些创新带来了更加细致和模块化的体系结构，增强了LLM的性能和可解释性。

RAG与其他人工智能方法(如微调和强化学习)的技术集成进一步扩展了其功能。

在内容检索中，一种利用结构化和非结构化数据源的混合方法正在成为一种趋势，提供更丰富的检索过程。在RAG框架内进行的前沿研究探索了从LLM中进行自检索和信息检索的动态时机等新颖概念。

Despite the strides made in RAG technology, research op-portunities abound in improving its robustness and its abil-ity to manage extended contexts. RAG’s application scope is also widening into multimodal domains, adapting its principles to interpret and process diverse data forms such as im-ages, videos, and code. This expansion underscores RAG’s significant practical implications for AI deployment, attract-ing interest from both academic and industrial sectors. The growing ecosystem of RAG is underscored by an increase in RAG-centric AI applications and the ongoing development of supportive tools. However, as RAG’s application land-scape expands, there is an imperative need to refine evaluation methodologies to keep pace with its evolution. Ensuring that performance assessments remain accurate and representative is crucial for capturing the full extent of RAG’s contributions to the AI research and development community.

尽管RAG技术取得了长足的进展，但在提高其稳健性和管理扩展上下文的能力方面仍有很多研究机会。

RAG的应用范围也在拓展到多模态领域，将其原则应用于解释和处理各种数据形式，如图像、视频和代码。这一扩展凸显了RAG对人工智能部署的重要实际影响，吸引了学术界和工业界的兴趣。

以RAG为中心的人工智能应用程序的增加和支持性工具的持续开发，强调了RAG生态系统的不断发展。然而，随着RAG应用程序领域的扩展，有必要改进评估方法以跟上其发展的步伐。确保性能评估保持准确和代表性对于充分了解RAG对AI研究和开发社区的贡献至关重要。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/人工智能uu/article/detail/998107