赞
踩
分类目录:《大模型从入门到应用》总目录
LangChain系列文章:
语言模型输出的是文本,但很多时候,我们可能希望获取比纯文本更结构化的信息,输出解析器(Output Parsers)就可以帮助我们获取结构化信息。输出解析器是结构化语言模型响应的类,其必须实现两个主要方法:
get_format_instructions() -> str
:返回一个包含语言模型输出如何格式化的指令字符串。parse(str) -> Any
:接受一个字符串(假设为语言模型的响应),并将其解析为某种结构。还有一个可选的方法:
parse_with_prompt(str) -> Any
:接受一个字符串(假设为语言模型的响应)和一个提示(假设为生成此响应的提示),并将其解析为某种结构。提示主要用于在输出解析器希望重新尝试或修复输出时提供信息,需要从提示中获取信息来完成此操作。CommaSeparatedListOutputParser
这是另一个比PydanticOutputParser
解析功能更弱的解析器:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template="List five {subject}.\n{format_instructions}",
input_variables=["subject"],
partial_variables={"format_instructions": format_instructions}
)
model = OpenAI(temperature=0)
_input = prompt.format(subject="ice cream flavors")
output = model(_input)
output_parser.parse(output)
输出:
['Vanilla',
'Chocolate',
'Strawberry',
'Mint Chocolate Chip',
'Cookies and Cream']
DatetimeOutputParser
这个OutputParser
展示了如何将LLM的输出解析为datetime
格式:
from langchain.prompts import PromptTemplate
from langchain.output_parsers import DatetimeOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI
output_parser = DatetimeOutputParser()
template = """Answer the users question:
{question}
{format_instructions}"""
prompt = PromptTemplate.from_template(template, partial_variables={"format_instructions": output_parser.get_format_instructions()})
chain = LLMChain(prompt=prompt, llm=OpenAI())
output = chain.run("around when was bitcoin founded?")
output
输出:
'\n\n2008-01-03T18:15:05.000000Z'
输入:
output_parser.parse(output)
输出:
datetime.datetime(2008, 1, 3, 18, 15, 5)
EnumOutputParser
from langchain.output_parsers.enum import EnumOutputParser
from enum import Enum
class Colors(Enum):
RED = "red"
GREEN = "green"
BLUE = "blue"
parser = EnumOutputParser(enum=Colors)
parser.parse("red")
输出:
<Colors.RED: 'red'>
输入:
# Can handle spaces
parser.parse(" green")
输出:
<Colors.GREEN: 'green'>
输入:
# And new lines
parser.parse("blue\n")
输出:
<Colors.BLUE: 'blue'>
# And raises errors when appropriate
输入:
parser.parse("yellow")
输出:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File ~/workplace/langchain/langchain/output_parsers/enum.py:25, in EnumOutputParser.parse(self, response)
24 try:
---> 25 return self.enum(response.strip())
26 except ValueError:
File ~/.pyenv/versions/3.9.1/lib/python3.9/enum.py:315, in EnumMeta.__call__(cls, value, names, module, qualname, type, start)
314 if names is None: # simple value lookup
--> 315 return cls.__new__(cls, value)
316 # otherwise, functional API: we're creating a new Enum type
File ~/.pyenv/versions/3.9.1/lib/python3.9/enum.py:611, in Enum.__new__(cls, value)
610 if result is None and exc is None:
--> 611 raise ve_exc
612 elif exc is None:
ValueError: 'yellow' is not a valid Colors
During handling of the above exception, another exception occurred:
OutputParserException Traceback (most recent call last)
Cell In[8], line 2
1 # And raises errors when appropriate
----> 2 parser.parse("yellow")
File ~/workplace/langchain/langchain/output_parsers/enum.py:27, in EnumOutputParser.parse(self, response)
25 return self.enum(response.strip())
26 except ValueError:
---> 27 raise OutputParserException(
28 f"Response '{response}' is not one of the "
29 f"expected values: {self._valid_values}"
30 )
OutputParserException: Response 'yellow' is not one of the expected values: ['red', 'green', 'blue']
OutputFixingParser
该输出解析器包装了另一个输出解析器,在第一个失败的情况下,它将被调用来修复错误。但我们可以在抛出错误之外做其他事情。具体而言,我们可以将格式错误的输出以及格式化指令传递给模型,并要求它进行修复。对于此示例,我们将使用上面的OutputParser
。以下是如果将不符合模式的结果传递给它时会发生的情况:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List
class Actor(BaseModel):
name: str = Field(description="name of an actor")
film_names: List[str] = Field(description="list of names of films they starred in")
actor_query = "Generate the filmography for a random actor."
parser = PydanticOutputParser(pydantic_object=Actor)
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"
parser.parse(misformatted)
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
File ~/workplace/langchain/langchain/output_parsers/pydantic.py:23, in PydanticOutputParser.parse(self, text)
22 json_str = match.group()
---> 23 json_object = json.loads(json_str)
24 return self.pydantic_object.parse_obj(json_object)
报错如下:
File ~/.pyenv/versions/3.9.1/lib/python3.9/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
File ~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
File ~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
352 try:
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
During handling of the above exception, another exception occurred:
OutputParserException Traceback (most recent call last)
Cell In[6], line 1
----> 1 parser.parse(misformatted)
File ~/workplace/langchain/langchain/output_parsers/pydantic.py:29, in PydanticOutputParser.parse(self, text)
27 name = self.pydantic_object.__name__
28 msg = f"Failed to parse {name} from completion {text}. Got: {e}"
---> 29 raise OutputParserException(msg)
OutputParserException: Failed to parse Actor from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
现在我们可以构建和使用一个OutputFixingParser
。该输出解析器接受另一个输出解析器和一个用于尝试纠正任何格式错误的LLM作为参数:
from langchain.output_parsers import OutputFixingParser
new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
new_parser.parse(misformatted)
输出:
Actor(name='Tom Hanks', film_names=['Forrest Gump'])
PydanticOutputParser
PydanticOutputParser
允许用户指定任意的JSON模式,并查询符合该模式的JSON输出的LLM。需要注意的是,我们必须使用足够容量的LLM来生成格式良好的JSON。在OpenAI系列中,DaVinci可以可靠地执行,但Curie的能力已经大大下降。使用Pydantic声明我们的数据模型,Pydantic的BaseModel
类似于Python的数据类,但具有实际的类型检查和强制转换功能。
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List
model_name = 'text-davinci-003'
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)
# 定义您期望的数据结构。
class Joke(BaseModel):
setup: str = Field(description="用于设置笑话的问题")
punchline: str = Field(description="解决笑话的答案")
# 您可以使用 Pydantic 轻松添加自定义验证逻辑。
@validator('setup')
def question_ends_with_question_mark(cls, field):
if field[-1] != '?':
raise ValueError("问题格式不正确!")
return field
# 需要通过查询来提示语言模型填充数据结构。
joke_query = "告诉我一个笑话。"
# 设置解析器并将指令注入到提示模板中。
parser = PydanticOutputParser(pydantic_object=Joke)
prompt = PromptTemplate(
template="回答用户的查询。\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
_input = prompt.format_prompt(query=joke_query)
output = model(_input.to_string())
parser.parse(output)
输出:
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')
class Actor(BaseModel):
name: str = Field(description="name of an actor")
film_names: List[str] = Field(description="list of names of films they starred in")
actor_query = "Generate the filmography for a random actor."
parser = PydanticOutputParser(pydantic_object=Actor)
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
_input = prompt.format_prompt(query=actor_query)
output = model(_input.to_string())
parser.parse(output)
输出:
Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'The Green Mile', 'Cast Away', 'Toy Story'])
RetryOutputParser
有时候,仅仅通过查看输出来修复任何解析错误是可能的,但在其他情况下则不行。一个例子是当输出不仅格式错误,而且部分不完整时:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser, RetryOutputParser
from pydantic import BaseModel, Field, validator
from typing import List
template = """Based on the user question, provide an Action and Action Input for what step should be taken.
{format_instructions}
Question: {query}
Response:"""
class Action(BaseModel):
action: str = Field(description="action to take")
action_input: str = Field(description="input to the action")
parser = PydanticOutputParser(pydantic_object=Action)
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
prompt_value = prompt.format_prompt(query="who is leo di caprios gf?")
bad_response = '{"action": "search"}'
如果我们尝试直接解析这个响应,将会出现错误:
parser.parse(bad_response)
输出:
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
File ~/workplace/langchain/langchain/output_parsers/pydantic.py:24, in PydanticOutputParser.parse(self, text)
23 json_object = json.loads(json_str)
---> 24 return self.pydantic_object.parse_obj(json_object)
26 except (json.JSONDecodeError, ValidationError) as e:
File ~/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/pydantic/main.py:527, in pydantic.main.BaseModel.parse_obj()
File ~/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.__init__()
ValidationError: 1 validation error for Action
action_input
field required (type=value_error.missing)
During handling of the above exception, another exception occurred:
OutputParserException Traceback (most recent call last)
Cell In[6], line 1
----> 1 parser.parse(bad_response)
File ~/workplace/langchain/langchain/output_parsers/pydantic.py:29, in PydanticOutputParser.parse(self, text)
27 name = self.pydantic_object.__name__
28 msg = f"Failed to parse {name} from completion {text}. Got: {e}"
---> 29 raise OutputParserException(msg)
OutputParserException: Failed to parse Action from completion {"action": "search"}. Got: 1 validation error for Action
action_input
field required (type=value_error.missing)
如果我们尝试使用OutputFixingParser
来修复这个错误,它将不知道应该为动作输入字段放入什么值:
fix_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
fix_parser.parse(bad_response)
Action(action='search', action_input='')
相反,我们可以使用RetryOutputParser
,它会再次尝试获取更好的响应,将提示(以及原始输出)作为参数传递进去。
from langchain.output_parsers import RetryWithErrorOutputParser
retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=OpenAI(temperature=0))
retry_parser.parse_with_prompt(bad_response, prompt_value)
Action(action='search', action_input='who is leo di caprios gf?')
虽然Pydanti的JSON解析器更强大,但我们最初尝试的数据结构仅包含文本字段。
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
在这里,我们定义了我们希望接收的响应模式:
response_schemas = [
ResponseSchema(name="answer", description="answer to the user's question"),
ResponseSchema(name="source", description="source used to answer the user's question, should be a website.")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
现在我们得到一个包含响应格式化指令的字符串,然后将其插入到我们的提示中:
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template="answer the users question as best as possible.\n{format_instructions}\n{question}",
input_variables=["question"],
partial_variables={"format_instructions": format_instructions}
)
现在我们可以使用它来格式化一个提示,然后将其发送给语言模型,并解析返回的结果:
model = OpenAI(temperature=0)
_input = prompt.format_prompt(question="what's the capital of france?")
output = model(_input.to_string())
output_parser.parse(output)
{'answer': 'Paris',
'source': 'https://www.worldatlas.com/articles/what-is-the-capital-of-france.html'}
下面是在聊天模型中使用它的一个示例:
chat_model = ChatOpenAI(temperature=0)
prompt = ChatPromptTemplate(
messages=[
HumanMessagePromptTemplate.from_template("answer the users question as best as possible.\n{format_instructions}\n{question}")
],
input_variables=["question"],
partial_variables={"format_instructions": format_instructions}
)
_input = prompt.format_prompt(question="what's the capital of france?")
output = chat_model(_input.to_messages())
output_parser.parse(output.content)
输出:
{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}
参考文献:
[1] LangChain官方网站:https://www.langchain.com/
[2] LangChain
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。