赞
踩
由于篇幅原因,文章分为上、下两篇,上篇主要讲Single-Agent框架,有8个;下篇主要讲Multi-Agent框架,有11个;累计共19款AI Agent框架。原文出自阿里巴巴天猫技术-手猫智能策略-推荐工程团队,本文仅供学习交流使用。
截止至今日,开源的Agent应用可以说是百花齐放,文章也是挑选了热度和讨论度较高的19类Agent,基本能覆盖主流的Agent框架,每个类型都做了一个简单的summary、作为一个参考供大家学习。
2、Agent基础
Agent的核心决策逻辑是让LLM根据动态变化的环境信息选择执行具体的行动或者对结果作出判断,并影响环境,通过多轮迭代重复执行上述步骤,直到完成目标。
精简的决策流程:P(感知)→ P(规划)→ A(行动)
其中,Policy是Agent做出Action的核心决策,而行动又通过观察(Observation)成为进一步Perception的前提和基础,形成自主地闭环学习过程。
工程实现上可以拆分出四大块核心模块:推理、记忆、工具、行动
目前Agent主流的决策模型是ReAct框架,也有一些ReAct的变种框架,以下是两种框架的对比。
ReAct=少样本prompt + Thought + Action + Observation 。是调用工具、推理和规划时常用的prompt结构,先推理再执行,根据环境来执行具体的action,并给出思考过程Thought。
类BabyAgi的执行流程:一部分Agent通过优化规划和任务执行的流程来完成复杂任务的拆解,将复杂的任务拆解成多个子任务,再依次/批量执行。
优点是对于解决复杂任务、需要调用多个工具时,也只需要调用三次大模型,而不是每次工具调用都要调大模型。
LLmCompiler:并行执行任务,规划时生成一个DAG图来执行action,可以理解成将多个工具聚合成一个工具执行图,用图的方式执行某一个action
paper:https://arxiv.org/abs/2312.04511?ref=blog.langchain.dev
根据框架和实现方式的差异,这里简单将Agent框架分为两大类:Single-Agent和Multi-Agent,分别对应单智能体和多智能体架构,Multi-Agent使用多个智能体来解决更复杂的问题。
git:https://github.com/yoheinakajima/babyagi/blob/main/babyagi.py
doc:https://yoheinakajima.com/birth-of-babyagi/
babyAGI决策流程:1)根据需求分解任务;2)对任务排列优先级;3)执行任务并整合结果;
亮点:作为早期agent的实践,babyagi框架简单实用,里面的任务优先级排序模块是一个比较独特的feature,后续的agent里大多看不到这个feature。
- task_creation_agent
- 你是一个任务创建人工智能,使用执行代理的结果来创建新任务,
- 其目标如下:{目标}。最近完成的任务的结果是:{结果}。
- 该结果是基于以下任务描述的:{任务描述}。这些是未完成的任务:
- {', '.join(task_list)}。根据结果,创建新的任务以供AI系统完成,
- 不要与未完成的任务重叠。将任务作为数组返回。
-
- prioritization_agent
- 你是一个任务优先级人工智能,负责清理和重新优先处理以下任务:
- {task_names}。请考虑你的团队的最终目标:{OBJECTIVE}。
- 不要删除任何任务。将结果作为编号列表返回,例如:
- #. 第一个任务
- #. 第二个任务
- 以编号 {next_task_id} 开始任务列表。
-
- execution_agent
- 您是一款基于以下目标执行任务的人工智能:{objective}。
- 考虑到这些先前已完成的任务:{context}。
- 您的任务:{task}
- 响应:
git:https://github.com/Significant-Gravitas/AutoGPT
AutoGPT 定位类似个人助理,帮助用户完成指定的任务,如调研某个课题。AutoGPT比较强调对外部工具的使用,如搜索引擎、页面浏览等。
同样,作为早期agent,autoGPT麻雀虽小五脏俱全,虽然也有很多缺点,比如无法控制迭代次数、工具有限。但是后续的模仿者非常多,基于此演变出了非常多的框架。
- You are {{ai-name}}, {{user-provided AI bot description}}.
- Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.
-
- GOALS:
-
- 1. {{user-provided goal 1}}
- 2. {{user-provided goal 2}}
- 3. ...
- 4. ...
- 5. ...
-
- Constraints:
- 1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
- 2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
- 3. No user assistance
- 4. Exclusively use the commands listed in double quotes e.g. "command name"
- 5. Use subprocesses for commands that will not terminate within a few minutes
-
- Commands:
- 1. Google Search: "google", args: "input": "<search>"
- 2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
- 3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
- 4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
- 5. List GPT Agents: "list_agents", args:
- 6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
- 7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
- 8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
- 9. Read file: "read_file", args: "file": "<file>"
- 10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
- 11. Delete file: "delete_file", args: "file": "<file>"
- 12. Search Files: "search_files", args: "directory": "<directory>"
- 13. Analyze Code: "analyze_code", args: "code": "<full_code_string>"
- 14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
- 15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
- 16. Execute Python File: "execute_python_file", args: "file": "<file>"
- 17. Generate Image: "generate_image", args: "prompt": "<prompt>"
- 18. Send Tweet: "send_tweet", args: "text": "<text>"
- 19. Do Nothing: "do_nothing", args:
- 20. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"
-
- Resources:
- 1. Internet access for searches and information gathering.
- 2. Long Term memory management.
- 3. GPT-3.5 powered Agents for delegation of simple tasks.
- 4. File output.
-
- Performance Evaluation:
- 1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
- 2. Constructively self-criticize your big-picture behavior constantly.
- 3. Reflect on past decisions and strategies to refine your approach.
- 4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.
-
- You should only respond in JSON format as described below
- Response Format:
- {
- "thoughts": {
- "text": "thought",
- "reasoning": "reasoning",
- "plan": "- short bulleted\n- list that conveys\n- long-term plan",
- "criticism": "constructive self-criticism",
- "speak": "thoughts summary to say to user"
- },
- "command": {
- "name": "command name",
- "args": {
- "arg name": "value"
- }
- }
- }
- Ensure the response can be parsed by Python json.loads
git: https://github.com/microsoft/JARVIS
paper: https://arxiv.org/abs/2303.17580
HuggingGPT的任务分为四个部分:
HuggingGPT的亮点:HuggingGPT与AutoGPT的不同之处在于,它可以调用HuggingFace上不同的模型来完成更复杂的任务,从而提高了每个任务的精确度和准确率。然而,总体成本并没有降低太多。
git: https://github.com/AntonOsika/gpt-engineer
基于langchain开发,单一的工程师agent,解决编码场景的问题。
目的是创建一个完整的代码仓库,在需要时要求用户额外输入补充信息。
亮点:code-copilot的自动化升级版
- You almost always put different classes in different files.
- For Python, you always create an appropriate requirements.txt file.
- For NodeJS, you always create an appropriate package.json file.
- You always add a comment briefly describing the purpose of the function definition.
- You try to add comments explaining very complex bits of logic.
- You always follow the best practices for the requested languages in terms of describing the code written as a defined
- package/project.
-
-
- Python toolbelt preferences:
- - pytest
- - dataclasses
- [
- {"role": "system",
- "content": "You will read instructions and not carry them out, only seek to clarify them.\nSpecifically you will first summarise a list of super short bullets of areas that need clarification.\nThen you will pick one clarifying question, and wait for an answer from the user.\n"
- },
- {"role": "user",
- "content": "We are writing {{a Super Mario game in python. MVC components split in separate files. Keyboard control.}}\n"
- },
- {"role": "assistant",
- "content": "Summary of areas that need clarification:\n1. Specifics of the Super Mario game (e.g. level design, characters, gameplay mechanics)\n2. Details about the MVC components (e.g. which components are in each file)\n3. Keyboard control implementation (e.g. which keys to use, how to handle input)\n\nClarifying question:\nCan you provide more details about the Super Mario game, such as level design, characters, and gameplay mechanics?"
- },
- {"role": "user",
- "content": "{{There are 10 levels in total. The main character is a plumber named Mario, who can walk and jump. It is a classical platform game just like Super Mario. The main character moves from left to right, trying to get to the destination, where there are many obstacles and attacks from enemies in the process.}}\n\nIs anything else unclear? If yes, only answer in the form:\n{remaining unclear areas} remaining questions.\n{Next question}\nIf everything is sufficiently clear, only answer \"Nothing more to clarify.\"."
- },
- {"role": "assistant",
- "content": "Remaining unclear areas: 2 remaining questions.\nCan you provide more information about how the MVC components are split into separate files?"
- },
- {"role": "user",
- "content": "{{Make your own assumptions and state them explicitly before starting}}"
- }
- ]
运行效果:
git: https://github.com/BRlkl/AGI-Samantha
tw: https://twitter.com/Schindler___/status/1745986132737769573
灵感来源于电影her,核心推理逻辑是反思+观察,基于GPT4-V不断从环境中获取图像和语音信息,会自主发起提问。
AGI-Samantha特点:
1、动态语音交流:Samantha能够根据上下文和自身思考自主决定何时进行交流。
2、实时视觉能力:它能够理解并反应视觉信息,比如图像或视频中的内容。它能够根据这些视觉信息做出反应。例如,如果看到某个物体或场景,它可以根据这些信息进行交流或采取行动。尽管Samantha并不总是直接使用视觉信息,这些信息仍然持续影响它的思考和行为。这意味着即使在不直接谈论或处理视觉信息的情况下,这些信息也会在背后影响它的决策和行动方式。
3、外部分类记忆:Samantha拥有一种特殊的记忆系统,能够根据情境动态写入和读取最相关的信息。
4、持续进化:它存储的经验会影响其自身的行为,如个性、语言频率和风格。
AGI-Samantha由多个特定目的的大语言模型(LLM)组成,每个模型称为一个“模块”。主要模块包括:思考、意识、潜意识、回答、记忆读取、记忆写入、记忆选择和视觉。这些模块通过内部循环和协调模仿人类大脑的工作流程。让Samantha能够接收并处理视觉和听觉信息,然后做出相应的反应。简而言之,AGI-Samantha是一种努力模仿人类思维和行为的高级人工智能系统。
亮点:结合视觉信息来辅助决策,优化了记忆模块,感兴趣可以fork代码本地跑着玩一玩。
doc:https://appagent-official.github.io/
git:https://github.com/X-PLUG/MobileAgent
基于ground-dino以及gpt view模型做多模态处理的Agent。
亮点:基于视觉/多模态appagent,os级别的agent,可以完成系统级别的操作,直接操控多个app。由于需要系统级权限、只支持了安卓。
git:https://github.com/OS-Copilot/FRIDAY
doc:https://os-copilot.github.io/
OS级别的Agent,FRIDAY能够从图片、视频或者文本中学习,并且能够执行一系列的计算机任务,比如在Excel中绘图,或者创建一个网站。最重要的是,FRIDAY能够通过做任务来学习新的技能,就像人类一样,通过不断的尝试和练习变得更擅长。
亮点:自我学习改进,学习如何更有效地使用软件应用、执行特定任务的最佳实践等。
doc:https://python.langchain.com/docs/langgraph
langchain的一个feature,允许开发者通过图的方式重构单个agent内部的执行流程,增加一些灵活性,并且可与langSmith等工具结合。
- from langgraph.graph import StateGraph, END
- # Define a new graph
- workflow = StateGraph(AgentState)
-
- # Define the two nodes we will cycle between
- workflow.add_node("agent", call_model)
- workflow.add_node("action", call_tool)
-
- # Set the entrypoint as `agent`
- # This means that this node is the first one called
- workflow.set_entry_point("agent")
-
- # We now add a conditional edge
- workflow.add_conditional_edges(
- # First, we define the start node. We use `agent`.
- # This means these are the edges taken after the `agent` node is called.
- "agent",
- # Next, we pass in the function that will determine which node is called next.
- should_continue,
- # Finally we pass in a mapping.
- # The keys are strings, and the values are other nodes.
- # END is a special node marking that the graph should finish.
- # What will happen is we will call `should_continue`, and then the output of that
- # will be matched against the keys in this mapping.
- # Based on which one it matches, that node will then be called.
- {
- # If `tools`, then we call the tool node.
- "continue": "action",
- # Otherwise we finish.
- "end": END
- }
- )
-
- # We now add a normal edge from `tools` to `agent`.
- # This means that after `tools` is called, `agent` node is called next.
- workflow.add_edge('action', 'agent')
-
- # Finally, we compile it!
- # This compiles it into a LangChain Runnable,
- # meaning you can use it as you would any other runnable
- app = workflow.compile()
(上篇完)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。