赞
踩
LLM 调用
Prompt管理,支持各种自定义模板
拥有大量的文档加载器,比如 Email、Markdown、PDF、Youtube ...
对索引的支持
Chains
文件夹、csv文件、印象笔记、google网盘、任意网页、pdf、s3、youtube等加载器
loader加载器读取到数据源后,数据源需要转换为Document 对象才能使用。
prompt、openai api embedding 功能都是有字符限制,需要进行分割
数据相关搜索是向量运算。doc需要先转化为向量才能进行向量运算,不能直接查询
chain可以理解为任务,一个chain就是一个任务。
动态帮助我们选择或调用已有工具的代理器
用于衡量文本的相关性,openai实现知识库的关键所在
- import os
- os.environ["OPENAI_API_KEY"] = '你的api key'
- from langchain.llms import OpenAI
-
- llm = OpenAI(model_name="text-davinci-003",max_tokens=1024)
- llm("怎么评价人工智能")
SerpApi: Google Search API 注册Serpapi账号
- import os
- os.environ["OPENAI_API_KEY"] = '你的api key'
- os.environ["SERPAPI_API_KEY"] = '你的api key'
-
- from langchain.agents import load_tools
- from langchain.agents import initialize_agent
- from langchain.llms import OpenAI
- from langchain.agents import AgentType
-
- # 加载 OpenAI 模型
- llm = OpenAI(temperature=0,max_tokens=2048)
-
- # 加载 serpapi 工具
- tools = load_tools(["serpapi"])
-
- # 如果搜索完想再计算一下可以这么写
- # tools = load_tools(['serpapi', 'llm-math'], llm=llm)
-
- # 如果搜索完想再让他再用python的print做点简单的计算,可以这样写
- # tools=load_tools(["serpapi","python_repl"])
-
- # 工具加载后都需要初始化,verbose 参数为 True,会打印全部的执行详情
- agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
-
- # 运行 agent
- agent.run("What's the date today? What great events have taken place today in history?")
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
现将文本分段,然后逐段总结,最后合并各个总结。
- from langchain.document_loaders import UnstructuredFileLoader
- from langchain.chains.summarize import load_summarize_chain
- from langchain.text_splitter import RecursiveCharacterTextSplitter
- from langchain import OpenAI
-
- # 导入文本
- loader = UnstructuredFileLoader("/content/sample_data/data/lg_test.txt")
- # 将文本转成 Document 对象
- document = loader.load()
- print(f'documents:{len(document)}')
-
- # 初始化文本分割器
- text_splitter = RecursiveCharacterTextSplitter(
- chunk_size = 500,
- chunk_overlap = 0
- )
-
- # 切分文本
- split_documents = text_splitter.split_documents(document)
- print(f'documents:{len(split_documents)}')
-
- # 加载 llm 模型
- llm = OpenAI(model_name="text-davinci-003", max_tokens=1500)
-
- # 创建总结链
- chain = load_summarize_chain(llm, chain_type="refine", verbose=True)
-
- # 执行总结链,(为了快速演示,只总结前5段)
- chain.run(split_documents[:5])
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
文档向量化、query向量化检索
- from langchain.embeddings.openai import OpenAIEmbeddings
- from langchain.vectorstores import Chroma
- from langchain.text_splitter import CharacterTextSplitter
- from langchain import OpenAI
- from langchain.document_loaders import DirectoryLoader
- from langchain.chains import RetrievalQA
-
- # 加载文件夹中的所有txt类型的文件
- loader = DirectoryLoader('/content/sample_data/data/', glob='**/*.txt')
- # 将数据转成 document 对象,每个文件会作为一个 document
- documents = loader.load()
-
- # 初始化加载器
- text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
- # 切割加载的 document
- split_docs = text_splitter.split_documents(documents)
-
- # 初始化 openai 的 embeddings 对象
- embeddings = OpenAIEmbeddings()
- # 将 document 通过 openai 的 embeddings 对象计算 embedding 向量信息并临时存入 Chroma 向量数据库,用于后续匹配查询
- docsearch = Chroma.from_documents(split_docs, embeddings)
-
- # 创建问答对象
- qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=docsearch.as_retriever(), return_source_documents=True)
- # 进行问答
- result = qa({"query": "科大讯飞今年第一季度收入是多少?"})
- print(result)
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
4.4中的向量化需要实时计算,可以考虑通过 Chroma 和 Pinecone 这两个数据库来讲一下如何做向量数据持久化。
Chroma持久化
- from langchain.vectorstores import Chroma
-
- # 持久化数据
- docsearch = Chroma.from_documents(documents, embeddings, persist_directory="D:/vector_store")
- docsearch.persist()
-
- # 加载数据
- docsearch = Chroma(persist_directory="D:/vector_store", embedding_function=embeddings)
Pinecone持久化
- # 持久化数据
- docsearch = Pinecone.from_texts([t.page_content for t in split_docs], embeddings, index_name=index_name)
-
- # 加载数据
- docsearch = Pinecone.from_existing_index(index_name, embeddings)
问答搜索
- from langchain.text_splitter import CharacterTextSplitter
- from langchain.document_loaders import DirectoryLoader
- from langchain.vectorstores import Chroma, Pinecone
- from langchain.embeddings.openai import OpenAIEmbeddings
- from langchain.llms import OpenAI
- from langchain.chains.question_answering import load_qa_chain
-
- import pinecone
-
- # 初始化 pinecone
- pinecone.init(
- api_key="你的api key",
- environment="你的Environment"
- )
-
- loader = DirectoryLoader('/content/sample_data/data/', glob='**/*.txt')
- # 将数据转成 document 对象,每个文件会作为一个 document
- documents = loader.load()
-
- # 初始化加载器
- text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
- # 切割加载的 document
- split_docs = text_splitter.split_documents(documents)
-
- index_name="liaokong-test"
-
- # 持久化数据
- # docsearch = Pinecone.from_texts([t.page_content for t in split_docs], embeddings, index_name=index_name)
-
- # 加载数据
- docsearch = Pinecone.from_existing_index(index_name,embeddings)
-
- query = "科大讯飞今年第一季度收入是多少?"
- docs = docsearch.similarity_search(query, include_metadata=True)
-
- llm = OpenAI(temperature=0)
- chain = load_qa_chain(llm, chain_type="stuff", verbose=True)
- chain.run(input_documents=docs, question=query)
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
加载视频,文本化、构建向量索引库,构建模版、初始化prompt,初始化问答链、回答问题
- import os
-
- from langchain.document_loaders import YoutubeLoader
- from langchain.embeddings.openai import OpenAIEmbeddings
- from langchain.vectorstores import Chroma
- from langchain.text_splitter import RecursiveCharacterTextSplitter
- from langchain.chains import ChatVectorDBChain, ConversationalRetrievalChain
-
- from langchain.chat_models import ChatOpenAI
- from langchain.prompts.chat import (
- ChatPromptTemplate,
- SystemMessagePromptTemplate,
- HumanMessagePromptTemplate
- )
-
- # 加载 youtube 频道
- loader = YoutubeLoader.from_youtube_url('https://www.youtube.com/watch?v=Dj60HHy-Kqk')
- # 将数据转成 document
- documents = loader.load()
-
- # 初始化文本分割器
- text_splitter = RecursiveCharacterTextSplitter(
- chunk_size=1000,
- chunk_overlap=20
- )
-
- # 分割 youtube documents
- documents = text_splitter.split_documents(documents)
-
- # 初始化 openai embeddings
- embeddings = OpenAIEmbeddings()
-
- # 将数据存入向量存储
- vector_store = Chroma.from_documents(documents, embeddings)
- # 通过向量存储初始化检索器
- retriever = vector_store.as_retriever()
-
- system_template = """
- Use the following context to answer the user's question.
- If you don't know the answer, say you don't, don't try to make it up. And answer in Chinese.
- -----------
- {question}
- -----------
- {chat_history}
- """
-
- # 构建初始 messages 列表,这里可以理解为是 openai 传入的 messages 参数
- messages = [
- SystemMessagePromptTemplate.from_template(system_template),
- HumanMessagePromptTemplate.from_template('{question}')
- ]
-
- # 初始化 prompt 对象
- prompt = ChatPromptTemplate.from_messages(messages)
-
-
- # 初始化问答链
- qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0.1,max_tokens=2048),retriever,condense_question_prompt=prompt)
-
-
- chat_history = []
- while True:
- question = input('问题:')
- # 开始发送问题 chat_history 为必须参数,用于存储对话历史
- result = qa({'question': question, 'chat_history': chat_history})
- chat_history.append((question, result['answer']))
- print(result['answer'])
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
使用 zapier
来实现将万种工具连接;申请zapier的api key .4Get Started - Zapier AI Actions
- import os
- os.environ["ZAPIER_NLA_API_KEY"] = ''
-
- from langchain.llms import OpenAI
- from langchain.agents import initialize_agent
- from langchain.agents.agent_toolkits import ZapierToolkit
- from langchain.utilities.zapier import ZapierNLAWrapper
-
-
- llm = OpenAI(temperature=.3)
- zapier = ZapierNLAWrapper()
- toolkit = ZapierToolkit.from_zapier_nla_wrapper(zapier)
- agent = initialize_agent(toolkit.get_tools(), llm, agent="zero-shot-react-description", verbose=True)
-
- # 我们可以通过打印的方式看到我们都在 Zapier 里面配置了哪些可以用的工具
- for tool in toolkit.get_tools():
- print (tool.name)
- print (tool.description)
- print ("\n\n")
-
- agent.run('请用中文总结最后一封"******@qq.com"发给我的邮件。并将总结发送给"******@qq.com"')
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
链式或顺序执行多个chain,定义多个chain通过SimpleSequentialChain串联起来
- from langchain.llms import OpenAI
- from langchain.chains import LLMChain
- from langchain.prompts import PromptTemplate
- from langchain.chains import SimpleSequentialChain
-
- # location 链
- llm = OpenAI(temperature=1)
- template = """Your job is to come up with a classic dish from the area that the users suggests.
- % USER LOCATION
- {user_location}
- YOUR RESPONSE:
- """
- prompt_template = PromptTemplate(input_variables=["user_location"], template=template)
- location_chain = LLMChain(llm=llm, prompt=prompt_template)
-
- # meal 链
- template = """Given a meal, give a short and simple recipe on how to make that dish at home.
- % MEAL
- {user_meal}
- YOUR RESPONSE:
- """
- prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)
- meal_chain = LLMChain(llm=llm, prompt=prompt_template)
-
- # 通过 SimpleSequentialChain 串联起来,第一个答案会被替换第二个中的user_meal,然后再进行询问
- overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)
- review = overall_chain.run("Rome")
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
定义输出结构,使用StructuredOutputParser结构化,让模型按照结构输出
- from langchain.output_parsers import StructuredOutputParser, ResponseSchema
- from langchain.prompts import PromptTemplate
- from langchain.llms import OpenAI
-
- llm = OpenAI(model_name="text-davinci-003")
-
- # 告诉他我们生成的内容需要哪些字段,每个字段类型式啥
- response_schemas = [
- ResponseSchema(name="bad_string", description="This a poorly formatted user input string"),
- ResponseSchema(name="good_string", description="This is your response, a reformatted response")
- ]
-
- # 初始化解析器
- output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
-
- # 生成的格式提示符
- # {
- # "bad_string": string // This a poorly formatted user input string
- # "good_string": string // This is your response, a reformatted response
- #}
- format_instructions = output_parser.get_format_instructions()
-
- template = """
- You will be given a poorly formatted string from a user.
- Reformat it and make sure all the words are spelled correctly
- {format_instructions}
- % USER INPUT:
- {user_input}
- YOUR RESPONSE:
- """
-
- # 将我们的格式描述嵌入到 prompt 中去,告诉 llm 我们需要他输出什么样格式的内容
- prompt = PromptTemplate(
- input_variables=["user_input"],
- partial_variables={"format_instructions": format_instructions},
- template=template
- )
-
- promptValue = prompt.format(user_input="welcom to califonya!")
- llm_output = llm(promptValue)
-
- # 使用解析器进行解析生成的内容
- output_parser.parse(llm_output)
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
- from langchain.prompts import PromptTemplate
- from langchain.llms import OpenAI
- from langchain.chains import LLMRequestsChain, LLMChain
-
- llm = OpenAI(model_name="gpt-3.5-turbo", temperature=0)
-
- template = """在 >>> 和 <<< 之间是网页的返回的HTML内容。
- 网页是新浪财经A股上市公司的公司简介。
- 请抽取参数请求的信息。
- >>> {requests_result} <<<
- 请使用如下的JSON格式返回数据
- {{
- "company_name":"a",
- "company_english_name":"b",
- "issue_price":"c",
- "date_of_establishment":"d",
- "registered_capital":"e",
- "office_address":"f",
- "Company_profile":"g"
- }}
- Extracted:"""
-
- prompt = PromptTemplate(
- input_variables=["requests_result"],
- template=template
- )
-
- chain = LLMRequestsChain(llm_chain=LLMChain(llm=llm, prompt=prompt))
- inputs = {
- "url": "https://vip.stock.finance.sina.com.cn/corp/go.php/vCI_CorpInfo/stockid/600519.phtml"
- }
-
- response = chain(inputs)
- print(response['output'])
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
定义工具的具体使用情景,遇到相应的任务使用自定义的工具
- from langchain.agents import initialize_agent, Tool
- from langchain.agents import AgentType
- from langchain.tools import BaseTool
- from langchain.llms import OpenAI
- from langchain import LLMMathChain, SerpAPIWrapper
-
- llm = OpenAI(temperature=0)
-
- # 初始化搜索链和计算链
- search = SerpAPIWrapper()
- llm_math_chain = LLMMathChain(llm=llm, verbose=True)
-
- # 创建一个功能列表,指明这个 agent 里面都有哪些可用工具,agent 执行过程可以看必知概念里的 Agent 那张图
- tools = [
- Tool(
- name = "Search",
- func=search.run,
- description="useful for when you need to answer questions about current events"
- ),
- Tool(
- name="Calculator",
- func=llm_math_chain.run,
- description="useful for when you need to answer questions about math"
- )
- ]
-
- # 初始化 agent
- agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
-
- # 执行 agent
- agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
使用自带memory实现带记忆机器人
- from langchain.memory import ChatMessageHistory
- from langchain.chat_models import ChatOpenAI
-
- chat = ChatOpenAI(temperature=0)
-
- # 初始化 MessageHistory 对象
- history = ChatMessageHistory()
-
- # 给 MessageHistory 对象添加对话内容
- history.add_ai_message("你好!")
- history.add_user_message("中国的首都是哪里?")
-
- # 执行对话
- ai_response = chat(history.messages)
- print(ai_response)
有2种使用hugging face的方式:在线、离线
在线使用hugging face
- #配置hugging face环境变量
- import os
- os.environ['HUGGINGFACEHUB_API_TOKEN'] = ''
-
- from langchain import PromptTemplate, HuggingFaceHub, LLMChain
-
- template = """Question: {question}
- Answer: Let's think step by step."""
-
- prompt = PromptTemplate(template=template, input_variables=["question"])
- llm = HuggingFaceHub(repo_id="google/flan-t5-xl", model_kwargs={"temperature":0, "max_length":64})
- llm_chain = LLMChain(prompt=prompt, llm=llm)
-
- question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
- print(llm_chain.run(question))
将模型加载到本地运行
- from langchain import PromptTemplate, LLMChain
- from langchain.llms import HuggingFacePipeline
- from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM
-
- model_id = 'google/flan-t5-large'
- tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
-
- pipe = pipeline(
- "text2text-generation",
- model=model,
- tokenizer=tokenizer,
- max_length=100
- )
-
- local_llm = HuggingFacePipeline(pipeline=pipe)
- print(local_llm('What is the capital of France? '))
-
-
- template = """Question: {question} Answer: Let's think step by step."""
- prompt = PromptTemplate(template=template, input_variables=["question"])
-
- llm_chain = LLMChain(prompt=prompt, llm=local_llm)
- question = "What is the capital of England?"
- print(llm_chain.run(question))
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
- from langchain.agents import create_sql_agent
- from langchain.agents.agent_toolkits import SQLDatabaseToolkit
- from langchain.sql_database import SQLDatabase
- from langchain.llms.openai import OpenAI
-
- db = SQLDatabase.from_uri("sqlite:///../notebooks/Chinook.db")
- toolkit = SQLDatabaseToolkit(db=db)
-
- agent_executor = create_sql_agent(
- llm=OpenAI(temperature=0),
- toolkit=toolkit,
- verbose=True
- )
-
- agent_executor.run("Describe the playlisttrack table")
GitHub - liaokongVFX/LangChain-Chinese-Getting-Started-Guide: LangChain 的中文入门教程
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。