当前位置:   article > 正文

9个范例带你入门LangChain

9个范例带你入门LangChain

前方干货预警:这可能是你心心念念想找的最好懂最具实操性langchain教程。 本文通过演示9个具有代表性的应用范例,带你零基础入门langchain。

9个范例功能列表如下:

1,文本总结(Summarization): 对文本/聊天内容的重点内容总结。

2,文档问答(Question and Answering Over Documents): 使用文档作为上下文信息,基于文档内容进行问答。

3,信息抽取(Extraction): 从文本内容中抽取结构化的内容。

4,结果评估(Evaluation): 分析并评估LLM输出的结果的好坏。

5,数据库问答(Querying Tabular Data): 从数据库/类数据库内容中抽取数据信息。

6,代码理解(Code Understanding): 分析代码,并从代码中获取逻辑,同时也支持QA。

7,API****交互(Interacting with APIs): 通过对API文档的阅读,理解API文档并向真实世界调用API获取真实数据。

8,聊天机器人(Chatbots): 具备记忆能力的聊天机器人框架(有UI交互能力)。

9,智能体(Agents): 使用LLMs进行任务分析和决策,并调用工具执行决策。

代码语言:javascript

# 在我们开始前,安装需要的依赖
!pip install langchain
!pip install openai
!pip install tiktoken 
!pip install faiss-cpu 
  • 1
  • 2
  • 3
  • 4
  • 5

代码语言:javascript

openai_api_key='YOUR_API_KEY'
# 使用你自己的OpenAI API key
  • 1
  • 2

一, 文本总结(Summarization)

扔给LLM一段文本,让它给你生成总结可以说是最常见的场景之一了。

目前最火的应用应该是 chatPDF,就是这种功能。

1,短文本总结

代码语言:javascript


# Summaries Of Short Text

from langchain.llms import OpenAI
from langchain import PromptTemplate

llm = OpenAI(temperature=0, model_name = 'gpt-3.5-turbo', openai_api_key=openai_api_key) # 初始化LLM模型

# 创建模板
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

# 创建一个 Lang Chain Prompt 模板,稍后可以插入值
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

代码语言:javascript

confusing_text = """
For the next 130 years, debate raged.
Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
“The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
“And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”
"""
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

代码语言:javascript

print ("------- Prompt Begin -------")
# 打印模板内容
final_prompt = prompt.format(text=confusing_text)
print(final_prompt)

print ("------- Prompt End -------")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

代码语言:javascript

------- Prompt Begin -------

%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:

For the next 130 years, debate raged.
Some scientists called Prototaxites a lichen, others a fungus, and still others clung to the notion that it was some kind of tree.
“The problem is that when you look up close at the anatomy, it’s evocative of a lot of different things, but it’s diagnostic of nothing,” says Boyce, an associate professor in geophysical sciences and the Committee on Evolutionary Biology.
“And it’s so damn big that when whenever someone says it’s something, everyone else’s hackles get up: ‘How could you have a lichen 20 feet tall?’”


------- Prompt End -------
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

代码语言:javascript

output = llm(final_prompt)
print (output)
  • 1
  • 2

代码语言:javascript

People argued for a long time about what Prototaxites was. Some thought it was a lichen, some thought it was a fungus, and some thought it was a tree. But it was hard to tell for sure because it looked like different things up close and it was really, really big.
  • 1
2,长文本总结

对于文本长度较短的文本我们可以直接这样执行summary操作

但是对于文本长度超过lLM支持的max token size 时将会遇到困难

Lang Chain 提供了开箱即用的工具解决长文本的问题:load_summarize_chain

代码语言:javascript

# Summaries Of Longer Text

from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

代码语言:javascript

with open('wonderland.txt', 'r') as file:
    text = file.read() # 文章本身是爱丽丝梦游仙境

# 打印小说的前285个字符
print (text[:285])
  • 1
  • 2
  • 3
  • 4
  • 5

代码语言:javascript

The Project Gutenberg eBook of Alice’s Adventures in Wonderland, by Lewis Carroll

This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it unde
  • 1
  • 2
  • 3
  • 4
  • 5

代码语言:javascript

num_tokens = llm.get_num_tokens(text)

print (f"There are {num_tokens} tokens in your file") 
# 全文一共4w8词
# 很明显这样的文本量是无法直接送进LLM进行处理和生成的
  • 1
  • 2
  • 3
  • 4
  • 5

There are 48613 tokens in your file

解决长文本的方式无非是’chunking’,‘splitting’ 原文本为小的段落/分割部分

代码语言:javascript

text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n"], chunk_size=5000, chunk_overlap=350)
# 虽然我使用的是 RecursiveCharacterTextSplitter,但是你也可以使用其他工具
docs = text_splitter.create_documents([text])

print (f"You now have {len(docs)} docs intead of 1 piece of text")
  • 1
  • 2
  • 3
  • 4
  • 5

You now have 36 docs intead of 1 piece of text

现在就需要一个 Lang Chain 工具,将分段文本送入LLM进行summary

代码语言:javascript

# 设置 lang chain
# 使用 map_reduce的chain_type,这样可以将多个文档合并成一个
chain = load_summarize_chain(llm=llm, chain_type='map_reduce') # verbose=True 展示运行日志
  • 1
  • 2
  • 3

代码语言:javascript

# Use it. This will run through the 36 documents, summarize the chunks, then get a summary of the summary.
# 典型的map reduce的思路去解决问题,将文章拆分成多个部分,再将多个部分分别进行 summarize,最后再进行 合并,对 summarys 进行 summary
output = chain.run(docs)
print (output)
# Try yourself
  • 1
  • 2
  • 3
  • 4
  • 5

代码语言:javascript

 Alice follows a white rabbit down a rabbit hole and finds herself in a strange world full of peculiar characters. She experiences many strange adventures and is asked to settle disputes between the characters. In the end, she is in a court of justice with the King and Queen of Hearts and is questioned by the King. Alice reads a set of verses and has a dream in which she remembers a secret. Project Gutenberg is a library of electronic works founded by Professor Michael S. Hart and run by volunteers.
  • 1

二,文档问答(QA based Documents)

为了确保LLM能够执行QA任务

  1. 需要向LLM传递能够让他参考的上下文信息
  2. 需要向LLM准确地传达我们的问题
1,短文本问答

代码语言:javascript

# 概括来说,使用文档作为上下文进行QA系统的构建过程类似于 llm(your context + your question) = your answer
# Simple Q&A Example

from langchain.llms import OpenAI

llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

代码语言:javascript

context = """
Rachel is 30 years old
Bob is 45 years old
Kevin is 65 years old
"""

question = "Who is under 40 years old?"
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

代码语言:javascript

output = llm(context + question)

print (output.strip())
  • 1
  • 2
  • 3

Rachel is under 40 years old.

2,长文本问答

对于更长的文本,可以文本进行分块,对分块的内容进行 embedding,将 embedding 存储到数据库中,然后进行查询。

目标是选择相关的文本块,但是我们应该选择哪些文本块呢?目前最流行的方法是基于比较向量嵌入来选择相似的文本。

代码语言:javascript

from langchain import OpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

代码语言:javascript

loader = TextLoader('wonderland.txt') # 载入一个长文本,我们还是使用爱丽丝漫游仙境这篇小说作为输入
doc = loader.load()
print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")
  • 1
  • 2
  • 3
  • 4

You have 1 document

You have 164014 characters in that document

代码语言:javascript

# 将小说分割成多个部分
text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)
  • 1
  • 2
  • 3

代码语言:javascript

# 获取字符的总数,以便可以计算平均值
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters (smaller pieces)")
  • 1
  • 2
  • 3
  • 4

代码语言:javascript

Now you have 62 documents that have an average of 2,846 characters (smaller pieces)
  • 1

代码语言:javascript

# 设置 embedding 引擎
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Embed 文档,然后使用伪数据库将文档和原始文本结合起来
# 这一步会向 OpenAI 发起 API 请求
docsearch = FAISS.from_documents(docs, embeddings)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

代码语言:javascript

# 创建QA-retrieval chain
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
  • 1
  • 2

代码语言:javascript

query = "What does the author describe the Alice following with?"
qa.run(query)
# 这个过程中,检索器会去获取类似的文件部分,并结合你的问题让 LLM 进行推理,最后得到答案
# 这一步还有很多可以细究的步骤,比如如何选择最佳的分割大小,如何选择最佳的 embedding 引擎,如何选择最佳的检索器等等
# 同时也可以选择云端向量存储
  • 1
  • 2
  • 3
  • 4
  • 5

代码语言:javascript

' The author describes Alice following a White Rabbit with pink eyes.'
  • 1

三,信息抽取(Extraction)

Extraction是从一段文本中解析结构化数据的过程.

通常与Extraction parser一起使用,以构建数据,以下是一些使用范例。

  1. 从句子中提取结构化行以插入数据库
  2. 从长文档中提取多行以插入数据库
  3. 从用户查询中提取参数以进行 API 调用
  4. 最近最火的 Extraction 库是 KOR
1,手动格式转换

代码语言:javascript

from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

from langchain.chat_models import ChatOpenAI


chat_model = ChatOpenAI(temperature=0, model='gpt-3.5-turbo', openai_api_key=openai_api_key)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

代码语言:javascript

# Vanilla Extraction
instructions = """
You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
Return the fruit name and emojis in a python dictionary
"""

fruit_names = """
Apple, Pear, this is an kiwi
"""
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

代码语言:javascript

# Make your prompt which combines the instructions w/ the fruit names
prompt = (instructions + fruit_names)

# Call the LLM
output = chat_model([HumanMessage(content=prompt)])

print (output.content)
print (type(output.content))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

代码语言:javascript

{'Apple': '
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/643586
推荐阅读
相关标签