赞
踩
前方干货预警:这可能是你心心念念想找的最好懂最具实操性的langchain教程。本文通过演示9个具有代表性的应用范例,带你零基础入门langchain。
公众号算法美食屋后台回复关键词:langchain,获取本文notebook源代码。
9个范例功能列表如下:
1,文本总结(Summarization): 对文本/聊天内容的重点内容总结。
2,文档问答(Question and Answering Over Documents): 使用文档作为上下文信息,基于文档内容进行问答。
3,信息抽取(Extraction): 从文本内容中抽取结构化的内容。
4,结果评估(Evaluation): 分析并评估LLM输出的结果的好坏。
5,数据库问答(Querying Tabular Data): 从数据库/类数据库内容中抽取数据信息。
6,代码理解(Code Understanding): 分析代码,并从代码中获取逻辑,同时也支持QA。
7,API交互(Interacting with APIs): 通过对API文档的阅读,理解API文档并向真实世界调用API获取真实数据。
8,聊天机器人(Chatbots): 具备记忆能力的聊天机器人框架(有UI交互能力)。
9,智能体(Agents): 使用LLMs进行任务分析和决策,并调用工具执行决策。
- # 在我们开始前,安装需要的依赖
- !pip install langchain
- !pip install openai
- !pip install tiktoken
- !pip install faiss-cpu
- openai_api_key='YOUR_API_KEY'
- # 使用你自己的OpenAI API key
扔给LLM一段文本,让它给你生成总结可以说是最常见的场景之一了。
目前最火的应用应该是 chatPDF,就是这种功能。
- # Summaries Of Short Text
-
- from langchain.llms import OpenAI
- from langchain import PromptTemplate
-
- llm = OpenAI(temperature=0, model_name = 'gpt-3.5-turbo', openai_api_key=openai_api_key) # 初始化LLM模型
-
- # 创建模板
- template = """
- %INSTRUCTIONS:
- Please summarize the following piece of text.
- Respond in a manner that a 5 year old would understand.
- %TEXT:
- {text}
- """
-
- # 创建一个 Lang Chain Prompt 模板,稍后可以插入值
- prompt = PromptTemplate(
- input_variables=["text"],
- template=template,
- )
- confusing_text = """
- For the next 130 years, debate raged.
- Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
- “The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
- “And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”
- """
- print ("------- Prompt Begin -------")
- # 打印模板内容
- final_prompt = prompt.format(text=confusing_text)
- print(final_prompt)
-
- print ("------- Prompt End -------")
- ------- Prompt Begin -------
-
- %INSTRUCTIONS:
- Please summarize the following piece of text.
- Respond in a manner that a 5 year old would understand.
-
- %TEXT:
-
- For the next 130 years, debate raged.
- Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
- “The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
- “And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”
-
-
- ------- Prompt End -------
- output = llm(final_prompt)
- print (output)
People argued for a long time about what Prototaxites was. Some thought it was a lichen, some thought it was a fungus, and some thought it was a tree. But it was hard to tell for sure because it looked like different things up close and it was really, really big.
对于文本长度较短的文本我们可以直接这样执行summary操作
但是对于文本长度超过lLM支持的max token size 时将会遇到困难
Lang Chain 提供了开箱即用的工具解决长文本的问题:load_summarize_chain
- # Summaries Of Longer Text
-
- from langchain.llms import OpenAI
- from langchain.chains.summarize import load_summarize_chain
- from langchain.text_splitter import RecursiveCharacterTextSplitter
-
- llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
- with open('wonderland.txt', 'r') as file:
- text = file.read() # 文章本身是爱丽丝梦游仙境
-
- # 打印小说的前285个字符
- print (text[:285])
- The Project Gutenberg eBook of Alice’s Adventures in Wonderland, by Lewis Carroll
-
- This eBook is for the use of anyone anywhere in the United States and
- most other parts of the world at no cost and with almost no restrictions
- whatsoever. You may copy it, give it away or re-use it unde
- num_tokens = llm.get_num_tokens(text)
-
- print (f"There are {num_tokens} tokens in your file")
- # 全文一共4w8词
- # 很明显这样的文本量是无法直接送进LLM进行处理和生成的
There are 48613 tokens in your file
解决长文本的方式无非是'chunking','splitting' 原文本为小的段落/分割部分
- text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=350)
- # 虽然我使用的是 RecursiveCharacterTextSplitter,但是你也可以使用其他工具
- docs = text_splitter.create_documents([text])
-
- print (f"You now have {len(docs)} docs intead of 1 piece of text")
You now have 36 docs intead of 1 piece of text
现在就需要一个 Lang Chain 工具,将分段文本送入LLM进行summary
- # 设置 lang chain
- # 使用 map_reduce的chain_type,这样可以将多个文档合并成一个
- chain = load_summarize_chain(llm=llm, chain_type='map_reduce') # verbose=True 展示运行日志
- # Use it. This will run through the 36 documents, summarize the chunks, then get a summary of the summary.
- # 典型的map reduce的思路去解决问题,将文章拆分成多个部分,再将多个部分分别进行 summarize,最后再进行 合并,对 summarys 进行 summary
- output = chain.run(docs)
- print (output)
- # Try yourself
Alice follows a white rabbit down a rabbit hole and finds herself in a strange world full of peculiar characters. She experiences many strange adventures and is asked to settle disputes between the characters. In the end, she is in a court of justice with the King and Queen of Hearts and is questioned by the King. Alice reads a set of verses and has a dream in which she remembers a secret. Project Gutenberg is a library of electronic works founded by Professor Michael S. Hart and run by volunteers.
为了确保LLM能够执行QA任务
需要向LLM传递能够让他参考的上下文信息
需要向LLM准确地传达我们的问题
- # 概括来说,使用文档作为上下文进行QA系统的构建过程类似于 llm(your context + your question) = your answer
- # Simple Q&A Example
-
- from langchain.llms import OpenAI
-
- llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
- context = """
- Rachel is 30 years old
- Bob is 45 years old
- Kevin is 65 years old
- """
-
- question = "Who is under 40 years old?"
- output = llm(context + question)
-
- print (output.strip())
Rachel is under 40 years old.
对于更长的文本,可以文本进行分块,对分块的内容进行 embedding,将 embedding 存储到数据库中,然后进行查询。
目标是选择相关的文本块,但是我们应该选择哪些文本块呢?目前最流行的方法是基于比较向量嵌入来选择相似的文本。
- from langchain import OpenAI
- from langchain.vectorstores import FAISS
- from langchain.chains import RetrievalQA
- from langchain.document_loaders import TextLoader
- from langchain.embeddings.openai import OpenAIEmbeddings
- llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
- loader = TextLoader('wonderland.txt') # 载入一个长文本,我们还是使用爱丽丝漫游仙境这篇小说作为输入
- doc = loader.load()
- print (f"You have {len(doc)} document")
- print (f"You have {len(doc[0].page_content)} characters in that document")
You have 1 document
You have 164014 characters in that document
- # 将小说分割成多个部分
- text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
- docs = text_splitter.split_documents(doc)
- # 获取字符的总数,以便可以计算平均值
- num_total_characters = sum([len(x.page_content) for x in docs])
-
- print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")
Now you have 62 documents that have an average of 2,846 characters (smaller pieces)
- # 设置 embedding 引擎
- embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
-
- # Embed 文档,然后使用伪数据库将文档和原始文本结合起来
- # 这一步会向 OpenAI 发起 API 请求
- docsearch = FAISS.from_documents(docs, embeddings)
- # 创建QA-retrieval chain
- qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
- query = "What does the author describe the Alice following with?"
- qa.run(query)
- # 这个过程中,检索器会去获取类似的文件部分,并结合你的问题让 LLM 进行推理,最后得到答案
- # 这一步还有很多可以细究的步骤,比如如何选择最佳的分割大小,如何选择最佳的 embedding 引擎,如何选择最佳的检索器等等
- # 同时也可以选择云端向量存储
' The author describes Alice following a White Rabbit with pink eyes.'
Extraction是从一段文本中解析结构化数据的过程.
通常与Extraction parser一起使用,以构建数据,以下是一些使用范例。
从句子中提取结构化行以插入数据库
从长文档中提取多行以插入数据库
从用户查询中提取参数以进行 API 调用
最近最火的 Extraction 库是 KOR
- from langchain.schema import HumanMessage
- from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
-
- from langchain.chat_models import ChatOpenAI
-
-
- chat_model = ChatOpenAI(temperature=0, model='gpt-3.5-turbo', openai_api_key=openai_api_key)
- # Vanilla Extraction
- instructions = """
- You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
- Return the fruit name and emojis in a python dictionary
- """
-
- fruit_names = """
- Apple, Pear, this is an kiwi
- """
- # Make your prompt which combines the instructions w/ the fruit names
- prompt = (instructions + fruit_names)
-
- # Call the LLM
- output = chat_model([HumanMessage(content=prompt)])
-
- print (output.content)
- print (type(output.content))
- {'Apple': '声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/100426推荐阅读
相关标签
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。