赞
踩
原文:A Cheat Sheet and Some Recipes For Building Advanced RAG
作者:Leonie Monigatti
这是一份全面的 RAG 指南,详细阐述了采用 RAG 的动机,以及如何超越基础或初级 RAG 构建的技术和策略。(高清版本链接)
新的一年伊始,你可能正考虑进入 RAG 领域,尝试构建你的首个 RAG 系统。或者,你已经构建了基础 RAG 系统,现在希望进一步提升,以便更好地处理用户的查询和数据结构。
无论你处于哪种情况,如何着手可能都是一个挑战!希望这篇博客文章能为你指明下一步的方向,并为你在构建高级 RAG 系统时提供一个思维模型,帮助你做出决策。
上文提到的 RAG 指南,很大程度上是受到了最近的一篇 RAG 综述论文的启发(“Retrieval-Augmented Generation for Large Language Models: A Survey”Gao, Yunfan 等人,2023)。
今天的主流 RAG 涉及从外部知识库检索文档,并将这些文档及用户的查询传递给大语言模型(LLM),以生成响应。换言之,RAG 包含了一个检索组件、一个外部知识库和一个生成组件。
LlamaIndex 基础 RAG 指南:
from llama_index import SimpleDirectoryReader, VectorStoreIndex
# load data
documents = SimpleDirectoryReader(input_dir="...").load_data()
# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex.from_documents(documents=documents)
# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine()
# Use your Default RAG
response = query_engine.query("A user's query")
为了使 RAG 系统成功(即能够为用户问题提供有用且相关的答案),主要有两个高层次的要求:
在明确了成功的要求后,我们可以说,构建高级 RAG 实际上是关于运用更复杂的技术和策略(应用于检索或生成组件),以确保这些要求得以满足。此外,我们可以将复杂的技术归类为:要么独立地解决两个高层次成功要求中的一个,要么同时解决这两个要求。
接下来,我们将简要介绍几种复杂但有效的技术,以帮助实现有效检索的首要目标。
LlamaIndex 文档块大小优化方法 (LlamaIndex Chunk Size Optimization Recipe) (教程 (notebook guide)):
from llama_index import ServiceContext from llama_index.param_tuner.base import ParamTuner, RunResult from llama_index.evaluation import SemanticSimilarityEvaluator, BatchEvalRunner ### Recipe ### Perform hyperparameter tuning as in traditional ML via grid-search ### 1. Define an objective function that ranks different parameter combos ### 2. Build ParamTuner object ### 3. Execute hyperparameter tuning with ParamTuner.tune() # 1. Define objective function def objective_function(params_dict): chunk_size = params_dict["chunk_size"] docs = params_dict["docs"] top_k = params_dict["top_k"] eval_qs = params_dict["eval_qs"] ref_response_strs = params_dict["ref_response_strs"] # build RAG pipeline index = _build_index(chunk_size, docs) # helper function not shown here query_engine = index.as_query_engine(similarity_top_k=top_k) # perform inference with RAG pipeline on a provided questions `eval_qs` pred_response_objs = get_responses( eval_qs, query_engine, show_progress=True ) # perform evaluations of predictions by comparing them to reference # responses `ref_response_strs` evaluator = SemanticSimilarityEvaluator(...) eval_batch_runner = BatchEvalRunner( {"semantic_similarity": evaluator}, workers=2, show_progress=True ) eval_results = eval_batch_runner.evaluate_responses( eval_qs, responses=pred_response_objs, reference=ref_response_strs ) # get semantic similarity metric mean_score = np.array( [r.score for r in eval_results["semantic_similarity"]] ).mean() return RunResult(score=mean_score, params=params_dict) # 2. Build ParamTuner object param_dict = {"chunk_size": [256, 512, 1024]} # params/values to search over fixed_param_dict = { # fixed hyperparams "top_k": 2, "docs": docs, "eval_qs": eval_qs[:10], "ref_response_strs": ref_response_strs[:10], } param_tuner = ParamTuner( param_fn=objective_function, param_dict=param_dict, fixed_param_dict=fixed_param_dict, show_progress=True, ) # 3. Execute hyperparameter search results = param_tuner.tune() best_result = results.best_run_result best_chunk_size = results.best_run_result.params["chunk_size"]
2. 构建结构化外部知识 (Structured External Knowledge): 面对复杂场景时,比起普通的向量索引,我们可能需要一个更有结构性的外部知识库。这样的设计可以在处理分散的知识源时,实现更精准的递归检索或路由检索。
LlamaIndex 结构化检索方法 (LlamaIndex Recursive Retrieval Recipe) (教程 (notebook guide)):
from llama_index import SimpleDirectoryReader, VectorStoreIndex from llama_index.node_parser import SentenceSplitter from llama_index.schema import IndexNode ### Recipe ### Build a recursive retriever that retrieves using small chunks ### but passes associated larger chunks to the generation stage # load data documents = SimpleDirectoryReader( input_file="some_data_path/llama2.pdf" ).load_data() # build parent chunks via NodeParser node_parser = SentenceSplitter(chunk_size=1024) base_nodes = node_parser.get_nodes_from_documents(documents) # define smaller child chunks sub_chunk_sizes = [256, 512] sub_node_parsers = [ SentenceSplitter(chunk_size=c, chunk_overlap=20) for c in sub_chunk_sizes ] all_nodes = [] for base_node in base_nodes: for n in sub_node_parsers: sub_nodes = n.get_nodes_from_documents([base_node]) sub_inodes = [ IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes ] all_nodes.extend(sub_inodes) # also add original node to node original_node = IndexNode.from_text_node(base_node, base_node.node_id) all_nodes.append(original_node) # define a VectorStoreIndex with all of the nodes vector_index_chunk = VectorStoreIndex( all_nodes, service_context=service_context ) vector_retriever_chunk = vector_index_chunk.as_retriever(similarity_top_k=2) # build RecursiveRetriever all_nodes_dict = {n.node_id: n for n in all_nodes} retriever_chunk = RecursiveRetriever( "vector", retriever_dict={"vector": vector_retriever_chunk}, node_dict=all_nodes_dict, verbose=True, ) # build RetrieverQueryEngine using recursive_retriever query_engine_chunk = RetrieverQueryEngine.from_args( retriever_chunk, service_context=service_context ) # perform inference with advanced RAG (i.e. query engine) response = query_engine_chunk.query( "Can you tell me about the key concepts for safety finetuning" )
其他推荐资源
为了在复杂的检索情况下实现高准确度,我们准备了一系列高级技术的应用指南。以下是其中一些精选教程的链接:
本节内容与前一节相似,我们将展示一些高级技术的例子。这些技术的核心在于确保检索到的文档与生成器使用的大语言模型 (LLM) 高度匹配。
LlamaIndex 信息压缩方法(请参阅笔记本指南):
from llama_index import SimpleDirectoryReader, VectorStoreIndex from llama_index.query_engine import RetrieverQueryEngine from llama_index.postprocessor import LongLLMLinguaPostprocessor ### Recipe ### Define a Postprocessor object, here LongLLMLinguaPostprocessor ### Build QueryEngine that uses this Postprocessor on retrieved docs # Define Postprocessor node_postprocessor = LongLLMLinguaPostprocessor( instruction_str="Given the context, please answer the final question", target_token=300, rank_method="longllmlingua", additional_compress_kwargs={ "condition_compare": True, "condition_in_question": "after", "context_budget": "+100", "reorder_context": "sort", # enable document reorder }, ) # Define VectorStoreIndex documents = SimpleDirectoryReader(input_dir="...").load_data() index = VectorStoreIndex.from_documents(documents) # Define QueryEngine retriever = index.as_retriever(similarity_top_k=2) retriever_query_engine = RetrieverQueryEngine.from_args( retriever, node_postprocessors=[node_postprocessor] ) # Used your advanced RAG response = retriever_query_engine.query("A user query")
LlamaIndex 结果重排序改进生成方法(请参阅笔记本指南):
import os from llama_index import SimpleDirectoryReader, VectorStoreIndex from llama_index.postprocessor.cohere_rerank import CohereRerank from llama_index.postprocessor import LongLLMLinguaPostprocessor ### Recipe ### Define a Postprocessor object, here CohereRerank ### Build QueryEngine that uses this Postprocessor on retrieved docs # Build CohereRerank post retrieval processor api_key = os.environ["COHERE_API_KEY"] cohere_rerank = CohereRerank(api_key=api_key, top_n=2) # Build QueryEngine (RAG) using the post processor documents = SimpleDirectoryReader("./data/paul_graham/").load_data() index = VectorStoreIndex.from_documents(documents=documents) query_engine = index.as_query_engine( similarity_top_k=10, node_postprocessors=[cohere_rerank], ) # Use your advanced RAG response = query_engine.query( "What did Sam Altman do in this essay?" )
在这个小节中,我们探讨了一些同时考虑检索与生成相结合的复杂技术,以期实现更有效的检索和更准确的生成回应。
LlamaIndex 生成器增强检索方法(请参阅笔记本指南):
from llama_index.llms import OpenAI from llama_index.query_engine import FLAREInstructQueryEngine from llama_index import ( VectorStoreIndex, SimpleDirectoryReader, ServiceContext, ) ### Recipe ### Build a FLAREInstructQueryEngine which has the generator LLM play ### a more active role in retrieval by prompting it to elicit retrieval ### instructions on what it needs to answer the user query. # Build FLAREInstructQueryEngine documents = SimpleDirectoryReader("./data/paul_graham").load_data() index = VectorStoreIndex.from_documents(documents) index_query_engine = index.as_query_engine(similarity_top_k=2) service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-4")) flare_query_engine = FLAREInstructQueryEngine( query_engine=index_query_engine, service_context=service_context, max_iterations=7, verbose=True, ) # Use your advanced RAG response = flare_query_engine.query( "Can you tell me about the author's trajectory in the startup world?" )
LlamaIndex 迭代式检索与生成器结合方法(请参阅笔记本指南):
from llama_index.query_engine import RetryQueryEngine from llama_index.evaluation import RelevancyEvaluator ### Recipe ### Build a RetryQueryEngine which performs retrieval-generation cycles ### until it either achieves a passing evaluation or a max number of ### cycles has been reached # Build RetryQueryEngine documents = SimpleDirectoryReader("./data/paul_graham").load_data() index = VectorStoreIndex.from_documents(documents) base_query_engine = index.as_query_engine() query_response_evaluator = RelevancyEvaluator() # evaluator to critique # retrieval-generation cycles retry_query_engine = RetryQueryEngine( base_query_engine, query_response_evaluator ) # Use your advanced rag retry_response = retry_query_engine.query("A user query")
对于 RAG 系统的评估自然是极其重要的。在 Gao, Yunfan 等人的调查论文中,他们提到了在 RAG 快速参考指南右上角所展示的 7 个关键测量方面。llama-index 库包括了多种评估工具和与 RAGAs 的整合功能,这些工具旨在帮助开发者从这些测量方面来评估他们的 RAG 系统是否满足预设的成功标准。下面,我们简要介绍了一些评估指南中的精选内容。
希望通过阅读这篇博客文章,你能对使用这些先进技术构建高级 RAG 系统感到更加得心应手和充满信心!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。