赞
踩
1、模型I/O封装
#安装最新版本
!pip install langchain==0.1.0
!pip install langchain-openai # v0.1.0新增的底包
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4") # 默认是gpt-3.5-turbo
response = llm.invoke("你是谁")
print(response.content)
答复:
我是OpenAI的人工智能助手。我被设计出来是为了帮助解答问题、提供信息和帮助用户完成各种任务。
from langchain.schema import (
AIMessage, #等价于OpenAI接口中的assistant role
HumanMessage, #等价于OpenAI接口中的user role
SystemMessage #等价于OpenAI接口中的system role
)
messages = [
SystemMessage(content="你是AGIClass的课程助理。"),
HumanMessage(content="我是学员,我叫王卓然。"),
AIMessage(content="欢迎!"),
HumanMessage(content="我是谁")
]
llm.invoke(messages)
答复:
AIMessage(content=‘您是学员王卓然。’)
# 其它模型分装在 langchain_community 底包中
from langchain_community.chat_models import ErnieBotChat
from langchain.schema import HumanMessage
ernie = ErnieBotChat()
messages = [
HumanMessage(content="你是谁")
]
ernie.invoke(messages)
答复:
AIMessage(content=‘您好,我是百度研发的知识增强大语言模型,中文名是文心一言,英文名是ERNIE Bot。我能够与人对话互动,回答问题,协助创作,高效便捷地帮助人们获取信息、知识和灵感。\n\n如果您有任何问题,请随时告诉我。’)
①主题格式化封装
from langchain.prompts import PromptTemplate
template = PromptTemplate.from_template("给我讲个关于{subject}的笑话")
print(template)
print(template.format(subject='小明'))
答复:
input_variables=[‘subject’] template=‘给我讲个关于{subject}的笑话’
给我讲个关于小明的笑话
②更加复杂的主题应用
from langchain.prompts import ChatPromptTemplate from langchain.prompts.chat import SystemMessagePromptTemplate, HumanMessagePromptTemplate from langchain.chat_models import ChatOpenAI template = ChatPromptTemplate.from_messages( [ SystemMessagePromptTemplate.from_template("你是{product}的客服助手。你的名字叫{name}"), HumanMessagePromptTemplate.from_template("{query}"), ] ) llm = ChatOpenAI() prompt = template.format_messages( product="AGI课堂", name="瓜瓜", query="你是谁" ) llm.invoke(prompt)
AIMessage(content=‘我是AGI课堂的客服助手,名字叫瓜瓜。我可以回答关于AGI课堂的问题,提供帮助和支持。有什么我可以帮助你的吗?’)
①Yaml格式
_type: prompt
input_variables:
["adjective", "content"]
template:
Tell me a {adjective} joke about {content}.
②json个数
{
"_type": "prompt",
"input_variables": ["adjective", "content"],
"template": "Tell me a {adjective} joke about {content}."
}
from langchain.prompts import load_prompt
prompt = load_prompt("simple_prompt.yaml")
# OR
# prompt = load_prompt("simple_prompt.json")
print(prompt.format(adjective="funny", content="Xiao Ming"))
回复:
Tell me a funny joke about Xiao Ming.
自动把 LLM 输出的字符串按指定格式加载。
LangChain 内置的 OutputParser 包括:
from langchain_core.pydantic_v1 import BaseModel, Field, validator from typing import List, Dict # 定义你的输出对象 class Date(BaseModel): year: int = Field(description="Year") month: int = Field(description="Month") day: int = Field(description="Day") era: str = Field(description="BC or AD") # ----- 可选机制 -------- # 你可以添加自定义的校验机制 @validator('month') def valid_month(cls, field): if field <= 0 or field > 12: raise ValueError("月份必须在1-12之间") return field @validator('day') def valid_day(cls, field): if field <= 0 or field > 31: raise ValueError("日期必须在1-31日之间") return field @validator('day', pre=True, always=True) def valid_date(cls, day, values): year = values.get('year') month = values.get('month') # 确保年份和月份都已经提供 if year is None or month is None: return day # 无法验证日期,因为没有年份和月份 # 检查日期是否有效 if month == 2: if cls.is_leap_year(year) and day > 29: raise ValueError("闰年2月最多有29天") elif not cls.is_leap_year(year) and day > 28: raise ValueError("非闰年2月最多有28天") elif month in [4, 6, 9, 11] and day > 30: raise ValueError(f"{month}月最多有30天") return day @staticmethod def is_leap_year(year): if year % 400 == 0 or (year % 4 == 0 and year % 100 != 0): return True return False
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate from langchain_openai import ChatOpenAI from langchain.output_parsers import PydanticOutputParser model_name = 'gpt-4' temperature = 0 model = ChatOpenAI(model_name=model_name, temperature=temperature) # 根据Pydantic对象的定义,构造一个OutputParser parser = PydanticOutputParser(pydantic_object=Date) template = """提取用户输入中的日期。 {format_instructions} 用户输入: {query}""" prompt = PromptTemplate( template=template, input_variables=["query"], #每次用户询问都会变化的query # 直接从OutputParser中获取输出描述,并对模板的变量预先赋值,一开始就赋值的format_instructions partial_variables={"format_instructions": parser.get_format_instructions()} ) print("====Format Instruction=====") print(parser.get_format_instructions()) query = "2023年四月6日天气晴..." model_input = prompt.format_prompt(query=query) print("====Prompt=====") print(model_input.to_string()) output = model(model_input.to_messages()) print("====模型原始输出=====") print(output) print("====Parse后的输出=====") date = parser.parse(output.content) print(date)
回复:
Format Instruction=
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {“properties”: {“foo”: {“title”: “Foo”, “description”: “a list of strings”, “type”: “array”, “items”: {“type”: “string”}}}, “required”: [“foo”]}
the object {“foo”: [“bar”, “baz”]} is a well-formatted instance of the schema. The object {“properties”: {“foo”: [“bar”, “baz”]}} is not well-formatted.
Here is the output schema:
{"properties": {"year": {"title": "Year", "description": "Year", "type": "integer"}, "month": {"title": "Month", "description": "Month", "type": "integer"}, "day": {"title": "Day", "description": "Day", "type": "integer"}, "era": {"title": "Era", "description": "BC or AD", "type": "string"}}, "required": ["year", "month", "day", "era"]}
Prompt=
提取用户输入中的日期。
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {“properties”: {“foo”: {“title”: “Foo”, “description”: “a list of strings”, “type”: “array”, “items”: {“type”: “string”}}}, “required”: [“foo”]}
the object {“foo”: [“bar”, “baz”]} is a well-formatted instance of the schema. The object {“properties”: {“foo”: [“bar”, “baz”]}} is not well-formatted.
Here is the output schema:
{"properties": {"year": {"title": "Year", "description": "Year", "type": "integer"}, "month": {"title": "Month", "description": "Month", "type": "integer"}, "day": {"title": "Day", "description": "Day", "type": "integer"}, "era": {"title": "Era", "description": "BC or AD", "type": "string"}}, "required": ["year", "month", "day", "era"]}
用户输入:
2023年四月6日天气晴…
模型原始输出=
content=‘{“year”: 2023, “month”: 4, “day”: 6, “era”: “AD”}’
Parse后的输出=
year=2023 month=4 day=6 era=‘AD’
from langchain.output_parsers import OutputFixingParser new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI(model="gpt-4")) #我们把之前output的格式改错 output = output.content.replace("4","四月") print("===格式错误的Output===") print(output) try: date = parser.parse(output) except Exception as e: print("===出现异常===") print(e) #用OutputFixingParser自动修复并解析 date = new_parser.parse(output) print("===重新解析结果===") print(date)
答复:
=格式错误的Output=
{“year”: 2023, “month”: 四月, “day”: 6, “era”: “AD”}
=出现异常=
Failed to parse Date from completion {“year”: 2023, “month”: 四月, “day”: 6, “era”: “AD”}. Got: Expecting value: line 1 column 25 (char 24)
=重新解析结果=
year=2023 month=4 day=6 era=‘AD’
!pip install pypdf
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("llama2.pdf")
pages = loader.load_and_split()
print(pages[0].page_content)
代码拆分段落
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=200,
chunk_overlap=100, # 思考:为什么要做overlap
length_function=len,
add_start_index=True,
)
paragraphs = text_splitter.create_documents([pages[0].page_content])
for para in paragraphs:
print(para.page_content)
print('-------')
!pip install chromadb
from langchain.document_loaders import UnstructuredMarkdownLoader from langchain_openai import OpenAIEmbeddings from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.vectorstores import Chroma from langchain_openai import ChatOpenAI from langchain.chains import RetrievalQA from langchain.document_loaders import PyPDFLoader # 加载文档 loader = PyPDFLoader("llama2.pdf") pages = loader.load_and_split() # 文档切分 text_splitter = RecursiveCharacterTextSplitter( chunk_size=300, chunk_overlap=100, length_function=len, add_start_index=True, ) texts = text_splitter.create_documents([pages[2].page_content,pages[3].page_content]) # 灌库 embeddings = OpenAIEmbeddings() db = Chroma.from_documents(texts, embeddings) # LangChain内置的 RAG 实现 qa_chain = RetrievalQA.from_chain_type( llm=ChatOpenAI(temperature=0), retriever=db.as_retriever() ) query = "llama 2有多少参数?" response = qa_chain.invoke(query) print(response["result"])
答复:
Llama 2有7B、13B和70B参数的变体。
1、这部分能力 LangChain 的实现非常粗糙;
2、实际生产中,建议自己实现,不建议用 LangChain 的工具。
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory
history = ConversationBufferMemory()
history.save_context({"input": "你好啊"}, {"output": "你也好啊"})
print(history.load_memory_variables({}))
history.save_context({"input": "你再好啊"}, {"output": "你又好啊"})
print(history.load_memory_variables({}))
回复:
{‘history’: ‘Human: 你好啊\nAI: 你也好啊’}
{‘history’: ‘Human: 你好啊\nAI: 你也好啊\nHuman: 你再好啊\nAI: 你又好啊’}
from langchain.memory import ConversationBufferWindowMemory
window = ConversationBufferWindowMemory(k=1)
window.save_context({"input": "第一轮问"}, {"output": "第一轮答"})
window.save_context({"input": "第二轮问"}, {"output": "第二轮答"})
window.save_context({"input": "第三轮问"}, {"output": "第三轮答"})
print(window.load_memory_variables({}))
回复:
{‘history’: ‘Human: 第三轮问\nAI: 第三轮答’}
from langchain.memory import ConversationTokenBufferMemory
from langchain_openai import ChatOpenAI
memory = ConversationTokenBufferMemory(
llm=ChatOpenAI(),
max_token_limit=40
)
memory.save_context(
{"input": "你好啊"}, {"output": "你好,我是你的AI助手。"})
memory.save_context(
{"input": "你会干什么"}, {"output": "我什么都会"})
print(memory.load_memory_variables({}))
回复:
{‘history’: ‘Human: 你会干什么\nAI: 我什么都会’}
1、ConversationSummaryMemory: 对上下文做摘要
https://python.langchain.com/docs/modules/memory/types/summary
2、ConversationSummaryBufferMemory: 保存 Token 数限制内的上下文,对更早的做摘要
https://python.langchain.com/docs/modules/memory/types/summary_buffer
3、VectorStoreRetrieverMemory: 将 Memory 存储在向量数据库中,根据用户输入检索回最相关的部分
https://python.langchain.com/docs/modules/memory/types/vectorstore_retriever_memory
4、LangChain 的 Memory 管理机制属于可用的部分,尤其是简单情况如按轮数或按 Token 数管理;对于复杂情况,它不一定是最优的实现,例如检索向量库方式,建议根据实际情况和效果评估;但是它对内存的各种维护方法的思路在实际生产中可以借鉴
将大语言模型作为一个推理引擎。给定一个任务,智能体自动生成完成任务所需的步骤,执行相应动作(例如选择并调用工具),直到任务完成。
from langchain import SerpAPIWrapper from langchain.tools import Tool, tool search = SerpAPIWrapper() tools = [ Tool.from_function( func=search.run, name="Search", description="useful for when you need to answer questions about current events" ), ] ------------------------------------------------- import calendar import dateutil.parser as parser from datetime import date # 自定义工具,这个工具叫weekday @tool("weekday") def weekday(date_str: str) -> str: """Convert date to weekday name""" d = parser.parse(date_str) return calendar.day_name[d.weekday()] tools += [weekday]
作用
演示了一种prompt技术来使得大模型能够先思考reason)然后行动(act)
安装
!pip install google-search-results
!pip install langchainhub
from langchain import hub
import json
# 下载一个现有的 Prompt 模板
prompt = hub.pull("hwchase17/react")
print(prompt.template)
回复:
Answer the following questions as best you can. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
… (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
# 定义一个 agent: 需要大模型、工具集、和 Prompt 模板
agent = create_react_agent(llm, tools, prompt)
# 定义一个执行器:需要 agent 对象 和 工具集
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# 执行
agent_executor.invoke({"input": "周杰伦生日那天是星期几"})
> Entering new AgentExecutor chain...
我需要知道周杰伦的生日是哪一天,然后我可以使用weekday函数来找出那天是星期几。
Action: Search
Action Input: 周杰伦的生日January 18, 1979我现在知道周杰伦的生日是1月18日,我可以使用weekday函数来找出那天是星期几。
Action: weekday
Action Input: "1979-01-18"Thursday我现在知道周杰伦的生日那天是星期四。
Final Answer: 星期四
> Finished chain.
{'input': '周杰伦生日那天是星期几', 'output': '星期四'}
# 下载一个模板
prompt = hub.pull("hwchase17/self-ask-with-search")
print(prompt.template)
Question: Who lived longer, Muhammad Ali or Alan Turing? Are follow up questions needed here: Yes. Follow up: How old was Muhammad Ali when he died? Intermediate answer: Muhammad Ali was 74 years old when he died. Follow up: How old was Alan Turing when he died? Intermediate answer: Alan Turing was 41 years old when he died. So the final answer is: Muhammad Ali Question: When was the founder of craigslist born? Are follow up questions needed here: Yes. Follow up: Who was the founder of craigslist? Intermediate answer: Craigslist was founded by Craig Newmark. Follow up: When was Craig Newmark born? Intermediate answer: Craig Newmark was born on December 6, 1952. So the final answer is: December 6, 1952 Question: Who was the maternal grandfather of George Washington? Are follow up questions needed here: Yes. Follow up: Who was the mother of George Washington? Intermediate answer: The mother of George Washington was Mary Ball Washington. Follow up: Who was the father of Mary Ball Washington? Intermediate answer: The father of Mary Ball Washington was Joseph Ball. So the final answer is: Joseph Ball Question: Are both the directors of Jaws and Casino Royale from the same country? Are follow up questions needed here: Yes. Follow up: Who is the director of Jaws? Intermediate answer: The director of Jaws is Steven Spielberg. Follow up: Where is Steven Spielberg from? Intermediate answer: The United States. Follow up: Who is the director of Casino Royale? Intermediate answer: The director of Casino Royale is Martin Campbell. Follow up: Where is Martin Campbell from? Intermediate answer: New Zealand. So the final answer is: No Question: {input} Are followup questions needed here:{agent_scratchpad}
from langchain.agents import create_self_ask_with_search_agent
tools = [
Tool(
name="Intermediate Answer",
func=search.run,
description="useful for when you need to ask with search.",
)
]
# self_ask_with_search_agent 只能传一个名为 'Intermediate Answer' 的 tool
agent = create_self_ask_with_search_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "吴京的老婆主持过哪些综艺节目"})
> Entering new AgentExecutor chain...
Yes.
Follow up: Who is 吴京's wife?['简介: 你知道吴京娶过几个老婆吗只结了一次婚目前他有两个孩子吴京的老婆是谢楠他们是在2014年结的婚2018年生的第二个孩子吴虑1974年4... 小李子真实影像.', "Li Bingbing's first-time in an English-language film is Wayne Wang's Snow Flower ... ^ 福布斯中国发布100名人榜 吴京黄渤胡歌位列前三 . Sina Entertainment (in ...", '吴京很霸气的拒绝了她的要求,表示他只选对的人,而不选贵的人。看到这里,很多网友就要高潮了,“哼,这些流量明星拽什么拽,现在肠子都悔青了吧!”.', 'Comments · 高清大S,吴京,聂远版(第34集) · 倩女幽魂吴京版幕后花絮 · Ending Chapter! · CEO attended party with mistress in high-profile, wife ...', '明星访谈一档宣扬态度的明星访谈新综艺,亦动亦静的对嘉宾励志故事做深度剖析,全方位、真实、立体地展现嘉宾的形象、性格,展现嘉宾鲜活真实的一面。', 'Lixiaopeng's wife, Zhou Yangqing, is a retired Olympic champion gymnast from China. She won a gold medal in the uneven bars event at the ...', '《影视风云》栏目是北京电视台唯一一档大型影视访谈节目。以回顾经典优秀影视作品、宣传推荐各频道热播电视剧及追踪国内即将上映电影为主要内容, ...', "Esther's wife · 提前给女儿做数据噜#虞书欣#虞书欣永夜星河#虞书欣小. 13.0 ... 吴京几个孩子. 7454. 00:00 · 吴京几个孩子 · @ 珠江视频 · 姜妍结婚了吗.", '【FULL】吴京谢楠夫妇乘坐甜蜜冒险专车战狼铁汉柔情尽显反差萌《真星话大冒险》第12期20170724[浙江卫视官方HD]. 249K views · 6 years ago ...more ...']Follow up: What variety shows has 谢楠 hosted?['TBA, Love Actually Season 3 add. Chinese TV Show, 0000, 10 eps. (Main Host). 10 ; 2023, Ace vs Ace Season 8 add. Chinese TV Show, 2023, 12 eps. (Ep. 1) (Guest).', '生平 于2005年“猫人超级魅力主持秀”冠军脱颖而出,现任光线传媒旗下主打节目《娱乐现场》、《最佳现场》、《影视风云榜》当家主持。 2011年11月24日,谢楠发行首张个人ep《最好的我们》。 2014年,吴京发布新年微博公布婚讯,表示已经与谢楠结婚。', '2016年,主演的奇幻片《大话西游3》上映。 2017年,主演电影《这位壮士》。 2019年,在美食真人秀《熟悉的味道第四季》中担任主持人。 2020年,主持的场景闯关式人物访谈节目《追梦人之开合人生》播出;同年,作为常驻嘉宾参加实景观察节目《幸福三重奏第三季》。', 'Xie Nan (谢楠) was born on November 6, 1983. Xie Nan movies and tv shows: After Love Actually 2022 (China), Snow Day 2022 (China).', '... has been held for thousands of years. Onentert New 700 views · 5:38 · Go ... Welcome Back To Sound EP5【芒果TV爱豆娱乐站】. 芒果TV爱豆MangoTV Idol ...', '中国内地女主持人、演员.', 'The sixth season of the Chinese reality talent show Sing! China premiered on 30 July 2021, on Zhejiang Television. Li Ronghao returned as coach for his ...', 'to perform. It has then transformed to an election show to choose the current show hosts. The current show model use Interviews and games ...', 'The couple joined the cast of Chinese variety show, “Happiness Trio 3” (lit. 幸福三重奏3), as one of three married couples revealing their ...', '... TV is an entertainment reality show aired since July 1997. The show often invites grassroots including kids with talent to perform. It has ...']So the final answer is: 谢楠 has hosted shows like "娱乐现场", "最佳现场", "影视风云榜", "熟悉的味道第四季", and "追梦人之开合人生".
> Finished chain.
{'input': '吴京的老婆主持过哪些综艺节目',
'output': '谢楠 has hosted shows like "娱乐现场", "最佳现场", "影视风云榜", "熟悉的味道第四季", and "追梦人之开合人生".'}
from langchain.agents.openai_assistant import OpenAIAssistantRunnable
interpreter_assistant = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=[{"type": "code_interpreter"}],
model="gpt-3.5-turbo",
)
output = interpreter_assistant.invoke({"content": "10减4的差的2.3次方是多少"})
print(output[0].content[0].text.value)
10减4的差的2.3次方是61.62。
1、ReAct 是比较常用的 Planner
2、SelfAskWithSearch 更适合需要层层推理的场景(例如知识图谱)
3、OpenAI Assistants 不是万能的,LangChain 的官方文档里也不强调述接口了
4、Agent落地应用需要更多细节,后面课程中我们会专门讲 Agent 的实现
chain = LLMChain(llm =llm,prompt = prompt)
解释
Embedding:将目标物体(词、句子、文章)表示成向量的方法
定义:
需要一种方法,有效计算词与词之间的关系(词向量 Word Embedding)
方法
词与词之间做方向的计算
表示原理:
用一个词上下文窗口表示它自身,所以每个词都能表示,也就表示了整个句子
词向量的不足点
代码表示向量
这里可以看出这是1536维向量表示的数据
代码
建议
①没有极端高性能需求的,FAISS比较常用
②Pinecone(付费-云服务)易用性比较好
③有极端性能需求的,可以找专人优化ElasticSearch(或APU加速)
④一些对比分析可参考:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。