赞
踩
在人工智能领域,检索和生成模型的融合推动了自然语言处理取得重大进展。
这篇博客探讨了知识图谱如何推动检索增强型生成(RAG)技术的革命,使人工智能系统能够访问结构化知识并生成具有相关性的响应。
掌握知识图谱的潜力,提升基于检索的语言模型的能力,为各种应用和行业打造更精准、更贴近语境的人工智能交互。
以下是如何生成知识图谱、将数据加载到图形数据库中,并将其用作检索增强生成之上的一个层,以提供更可管理和更强大的知识的分步说明。
我们将在这里使用Nebula,您也可以选择任何熟悉的图形数据库,如Neptune、Neo4j、Cosmos DB等。出于演示的目的,我们将在本地Docker容器中托管该数据库。
- docker pull vesoft/nebula-graph:nightly
- docker run -d --name nebula-server vesoft/nebula-graph:nightly
容器启动后,请设置nebula环境变量。
- "GRAPHD_HOST": <host>,
- "GRAPHD_PORT": "9669",
- "NEBULA_USER": "root",
- "NEBULA_PASSWORD": "nebula",
- "NEBULA_ADDRESS": GRAPHD_HOST:GRAPHD_PORT
连接数据库,并创建了空间和存储
- %reload_ext ngql
- connection_string = f"--address {os.environ['GRAPHD_HOST']} --port 9669 --user root --password {os.environ['NEBULA_PASSWORD']}"
- %ngql {connection_string}
-
- %ngql CREATE SPACE IF NOT EXISTS rag_demo(vid_type=FIXED_STRING(256), partition_num=1, replica_factor=1);
-
- %%ngql
- USE rag_demo;
- CREATE TAG IF NOT EXISTS entity(name string);
- CREATE EDGE IF NOT EXISTS relationship(relationship string);
- space_name = "rag_demo"
- edge_types, rel_prop_names = ["relationship"], ["relationship"]
- tags = ["entity"]
-
- graph_store = NebulaGraphStore(
- space_name=space_name,
- edge_types=edge_types,
- rel_prop_names=rel_prop_names,
- tags=tags,
- )
- storage_context = StorageContext.from_defaults(graph_store=graph_store)
这里我们加载了一篇关于加密的维基百科文章作为我们知识的来源。
- from llama_index import download_loader
-
- WikipediaReader = download_loader("WikipediaReader")
- loader = WikipediaReader()
- documents = loader.load_data(pages=['Advanced Encryption Standard'], auto_suggest=False)
从文档中提取知识图谱,并加载到nebula
- kg_index = KnowledgeGraphIndex.from_documents(
- documents,
- storage_context=storage_context,
- service_context=service_context,
- max_triplets_per_chunk=10,
- space_name=space_name,
- edge_types=edge_types,
- rel_prop_names=rel_prop_names,
- tags=tags,
- )
以下是从文档中生成的知识图谱:
首先,我们可以直接从知识图谱中查询这个问题,并通过总结查询结果来生成答案。
- kg_index_query_engine = kg_index.as_query_engine(
- retriever_mode="keyword",
- verbose=True,
- response_mode="tree_summarize",
- )
-
- response_graph_rag = kg_index_query_engine.query("What is the secure level of AES encryption")
-
- display(Markdown(f"<b>{response_graph_rag}</b>"))
这里的查询字符串是一个问题:AES加密的安全级别是什么,它根据你的问题提取了关键词,并从你的图形数据库中查询它们。
我们可以在知识图谱中应用RAG,而不是直接查询,我们可以根据问题在知识图谱中进行搜索,并使用搜索结果作为上下文来生成答案。
- graph_rag_retriever = KnowledgeGraphRAGRetriever(
- storage_context=storage_context,
- service_context=service_context,
- llm=llm,
- verbose=True,
- )
-
- query_engine = RetrieverQueryEngine.from_args(
- graph_rag_retriever, service_context=service_context
- )
-
- response = query_engine.query(
- "What is AES?",
- )
- display(Markdown(f"<b>{response}</b>"))
知识图谱 rag 最强大的用例在于它可以与现有的搜索结合使用,将额外的知识注入 GenAI 结果中,从而提高性能。
为了实现这一目标,我们需要创建一个结合向量查询和图数据库查询结果的自定义检索。
创建搜索
# import QueryBundle from llama_index import QueryBundle # import NodeWithScore from llama_index.data_structs import NodeWithScore # Retrievers from llama_index.retrievers import BaseRetriever, VectorIndexRetriever, KGTableRetriever from typing import List class CustomRetriever(BaseRetriever): """Custom retriever that performs both Vector search and Knowledge Graph search""" def __init__( self, vector_retriever: VectorIndexRetriever, kg_retriever: KGTableRetriever, mode: str = "OR" ) -> None: """Init params.""" self._vector_retriever = vector_retriever self._kg_retriever = kg_retriever if mode not in ("AND", "OR"): raise ValueError("Invalid mode.") self._mode = mode def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: """Retrieve nodes given query.""" vector_nodes = self._vector_retriever.retrieve(query_bundle) kg_nodes = self._kg_retriever.retrieve(query_bundle) vector_ids = {n.node.get_doc_id() for n in vector_nodes} kg_ids = {n.node.get_doc_id() for n in kg_nodes} combined_dict = {n.node.get_doc_id(): n for n in vector_nodes} combined_dict.update({n.node.get_doc_id(): n for n in kg_nodes}) if self._mode == "AND": retrieve_ids = vector_ids.intersection(kg_ids) else: retrieve_ids = vector_ids.union(kg_ids) retrieve_nodes = [combined_dict[rid] for rid in retrieve_ids] return retrieve_nodes
- from llama_index import ResponseSynthesizer
- from llama_index.query_engine import RetrieverQueryEngine
-
- # create custom retriever
- vector_retriever = VectorIndexRetriever(index=vector_index)
- kg_retriever = KGTableRetriever(index=kg_index, retriever_mode='keyword', include_text=False)
- custom_retriever = CustomRetriever(vector_retriever, kg_retriever)
-
- # create response synthesizer
- response_synthesizer = ResponseSynthesizer.from_args(
- service_context=service_context,
- response_mode="tree_summarize",
- )
创建查询引擎
- custom_query_engine = RetrieverQueryEngine(
- retriever=custom_retriever,
- response_synthesizer=response_synthesizer,
- )
-
- vector_query_engine = vector_index.as_query_engine()
-
- kg_keyword_query_engine = kg_index.as_query_engine(
- # setting to false uses the raw triplets instead of adding the text from the corresponding nodes
- include_text=False,
- retriever_mode='keyword',
- response_mode="tree_summarize",
- )
进行查询
- response = kg_keyword_query_engine.query(
- "What is your encryption method, how secure it is?"
- )
- display(Markdown(f"<b>{response}</b>"))
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。