当前位置:   article > 正文

大模型从入门到应用——LangChain:提示(Prompts)-[输出解析器(Output Parsers)]_大模型提示词 进行变量输出

大模型提示词 进行变量输出

分类目录:《大模型从入门到应用》总目录

LangChain系列文章:


语言模型输出的是文本,但很多时候,我们可能希望获取比纯文本更结构化的信息,输出解析器(Output Parsers)就可以帮助我们获取结构化信息。输出解析器是结构化语言模型响应的类,其必须实现两个主要方法:

  • get_format_instructions() -> str:返回一个包含语言模型输出如何格式化的指令字符串。
  • parse(str) -> Any:接受一个字符串(假设为语言模型的响应),并将其解析为某种结构。

还有一个可选的方法:

  • parse_with_prompt(str) -> Any:接受一个字符串(假设为语言模型的响应)和一个提示(假设为生成此响应的提示),并将其解析为某种结构。提示主要用于在输出解析器希望重新尝试或修复输出时提供信息,需要从提示中获取信息来完成此操作。

逗号分隔列表输出解析器CommaSeparatedListOutputParser

这是另一个比PydanticOutputParser解析功能更弱的解析器:

from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions}
)
model = OpenAI(temperature=0)
_input = prompt.format(subject="ice cream flavors")
output = model(_input)
output_parser.parse(output)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

输出:

['Vanilla',
'Chocolate',
'Strawberry',
'Mint Chocolate Chip',
'Cookies and Cream']
  • 1
  • 2
  • 3
  • 4
  • 5

日期时间解析器DatetimeOutputParser

这个OutputParser展示了如何将LLM的输出解析为datetime格式:

from langchain.prompts import PromptTemplate
from langchain.output_parsers import DatetimeOutputParser
from langchain.chains import LLMChain
from langchain.llms import OpenAI

output_parser = DatetimeOutputParser()
template = """Answer the users question:

{question}

{format_instructions}"""
prompt = PromptTemplate.from_template(template, partial_variables={"format_instructions": output_parser.get_format_instructions()})
chain = LLMChain(prompt=prompt, llm=OpenAI())
output = chain.run("around when was bitcoin founded?")
output
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

输出:

'\n\n2008-01-03T18:15:05.000000Z'
  • 1

输入:

output_parser.parse(output)
  • 1

输出:

datetime.datetime(2008, 1, 3, 18, 15, 5)
  • 1

枚举输出解析器EnumOutputParser

from langchain.output_parsers.enum import EnumOutputParser
from enum import Enum

class Colors(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"
parser = EnumOutputParser(enum=Colors)
parser.parse("red")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

输出:

<Colors.RED: 'red'>
  • 1

输入:

# Can handle spaces
parser.parse(" green")
  • 1
  • 2

输出:

<Colors.GREEN: 'green'>
  • 1

输入:

# And new lines
parser.parse("blue\n")
  • 1
  • 2

输出:

<Colors.BLUE: 'blue'>
# And raises errors when appropriate
  • 1
  • 2

输入:

parser.parse("yellow")
  • 1

输出:

    ---------------------------------------------------------------------------

    ValueError                                Traceback (most recent call last)

    File ~/workplace/langchain/langchain/output_parsers/enum.py:25, in EnumOutputParser.parse(self, response)
         24 try:
    ---> 25     return self.enum(response.strip())
         26 except ValueError:


    File ~/.pyenv/versions/3.9.1/lib/python3.9/enum.py:315, in EnumMeta.__call__(cls, value, names, module, qualname, type, start)
        314 if names is None:  # simple value lookup
    --> 315     return cls.__new__(cls, value)
        316 # otherwise, functional API: we're creating a new Enum type


    File ~/.pyenv/versions/3.9.1/lib/python3.9/enum.py:611, in Enum.__new__(cls, value)
        610 if result is None and exc is None:
    --> 611     raise ve_exc
        612 elif exc is None:


    ValueError: 'yellow' is not a valid Colors

    
    During handling of the above exception, another exception occurred:


    OutputParserException                     Traceback (most recent call last)

    Cell In[8], line 2
          1 # And raises errors when appropriate
    ----> 2 parser.parse("yellow")


    File ~/workplace/langchain/langchain/output_parsers/enum.py:27, in EnumOutputParser.parse(self, response)
         25     return self.enum(response.strip())
         26 except ValueError:
    ---> 27     raise OutputParserException(
         28         f"Response '{response}' is not one of the "
         29         f"expected values: {self._valid_values}"
         30     )


    OutputParserException: Response 'yellow' is not one of the expected values: ['red', 'green', 'blue']
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45

输出修复解析器OutputFixingParser

该输出解析器包装了另一个输出解析器,在第一个失败的情况下,它将被调用来修复错误。但我们可以在抛出错误之外做其他事情。具体而言,我们可以将格式错误的输出以及格式化指令传递给模型,并要求它进行修复。对于此示例,我们将使用上面的OutputParser。以下是如果将不符合模式的结果传递给它时会发生的情况:

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")
        
actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"
parser.parse(misformatted)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
---------------------------------------------------------------------------

JSONDecodeError                           Traceback (most recent call last)

File ~/workplace/langchain/langchain/output_parsers/pydantic.py:23, in PydanticOutputParser.parse(self, text)
     22     json_str = match.group()
---> 23 json_object = json.loads(json_str)
     24 return self.pydantic_object.parse_obj(json_object)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

报错如下:

File ~/.pyenv/versions/3.9.1/lib/python3.9/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:


File ~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()


File ~/.pyenv/versions/3.9.1/lib/python3.9/json/decoder.py:353, in JSONDecoder.raw_decode(self, s, idx)
    352 try:
--> 353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:


JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)


During handling of the above exception, another exception occurred:


OutputParserException                     Traceback (most recent call last)

Cell In[6], line 1
----> 1 parser.parse(misformatted)


File ~/workplace/langchain/langchain/output_parsers/pydantic.py:29, in PydanticOutputParser.parse(self, text)
     27 name = self.pydantic_object.__name__
     28 msg = f"Failed to parse {name} from completion {text}. Got: {e}"
---> 29 raise OutputParserException(msg)


OutputParserException: Failed to parse Actor from completion {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}. Got: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42

现在我们可以构建和使用一个OutputFixingParser。该输出解析器接受另一个输出解析器和一个用于尝试纠正任何格式错误的LLM作为参数:

from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
new_parser.parse(misformatted)
  • 1
  • 2
  • 3
  • 4

输出:

Actor(name='Tom Hanks', film_names=['Forrest Gump'])
  • 1

Pydantic输出解析器PydanticOutputParser

PydanticOutputParser允许用户指定任意的JSON模式,并查询符合该模式的JSON输出的LLM。需要注意的是,我们必须使用足够容量的LLM来生成格式良好的JSON。在OpenAI系列中,DaVinci可以可靠地执行,但Curie的能力已经大大下降。使用Pydantic声明我们的数据模型,Pydantic的BaseModel类似于Python的数据类,但具有实际的类型检查和强制转换功能。

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

model_name = 'text-davinci-003'
temperature = 0.0
model = OpenAI(model_name=model_name, temperature=temperature)

# 定义您期望的数据结构。
class Joke(BaseModel):
    setup: str = Field(description="用于设置笑话的问题")
    punchline: str = Field(description="解决笑话的答案")
    
    # 您可以使用 Pydantic 轻松添加自定义验证逻辑。
    @validator('setup')
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError("问题格式不正确!")
        return field

# 需要通过查询来提示语言模型填充数据结构。
joke_query = "告诉我一个笑话。"

# 设置解析器并将指令注入到提示模板中。
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="回答用户的查询。\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
_input = prompt.format_prompt(query=joke_query)

output = model(_input.to_string())

parser.parse(output)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39

输出:

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')
  • 1
带有复合类型的字段的示例
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")
        
actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

_input = prompt.format_prompt(query=actor_query)

output = model(_input.to_string())
parser.parse(output)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

输出:

Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'The Green Mile', 'Cast Away', 'Toy Story'])
  • 1

重试输出解释器RetryOutputParser

有时候,仅仅通过查看输出来修复任何解析错误是可能的,但在其他情况下则不行。一个例子是当输出不仅格式错误,而且部分不完整时:

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser, RetryOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

template = """Based on the user question, provide an Action and Action Input for what step should be taken.
{format_instructions}
Question: {query}
Response:"""
class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")
        
parser = PydanticOutputParser(pydantic_object=Action)
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)
prompt_value = prompt.format_prompt(query="who is leo di caprios gf?")
bad_response = '{"action": "search"}'
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

如果我们尝试直接解析这个响应,将会出现错误:

parser.parse(bad_response)
  • 1

输出:

    ---------------------------------------------------------------------------

    ValidationError                           Traceback (most recent call last)

    File ~/workplace/langchain/langchain/output_parsers/pydantic.py:24, in PydanticOutputParser.parse(self, text)
         23     json_object = json.loads(json_str)
    ---> 24     return self.pydantic_object.parse_obj(json_object)
         26 except (json.JSONDecodeError, ValidationError) as e:


    File ~/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/pydantic/main.py:527, in pydantic.main.BaseModel.parse_obj()


    File ~/.pyenv/versions/3.9.1/envs/langchain/lib/python3.9/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.__init__()


    ValidationError: 1 validation error for Action
    action_input
      field required (type=value_error.missing)

    
    During handling of the above exception, another exception occurred:


    OutputParserException                     Traceback (most recent call last)

    Cell In[6], line 1
    ----> 1 parser.parse(bad_response)


    File ~/workplace/langchain/langchain/output_parsers/pydantic.py:29, in PydanticOutputParser.parse(self, text)
         27 name = self.pydantic_object.__name__
         28 msg = f"Failed to parse {name} from completion {text}. Got: {e}"
    ---> 29 raise OutputParserException(msg)


    OutputParserException: Failed to parse Action from completion {"action": "search"}. Got: 1 validation error for Action
    action_input
      field required (type=value_error.missing)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39

如果我们尝试使用OutputFixingParser来修复这个错误,它将不知道应该为动作输入字段放入什么值:

fix_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())
fix_parser.parse(bad_response)
Action(action='search', action_input='')
  • 1
  • 2
  • 3

相反,我们可以使用RetryOutputParser,它会再次尝试获取更好的响应,将提示(以及原始输出)作为参数传递进去。

from langchain.output_parsers import RetryWithErrorOutputParser
retry_parser = RetryWithErrorOutputParser.from_llm(parser=parser, llm=OpenAI(temperature=0))
retry_parser.parse_with_prompt(bad_response, prompt_value)
Action(action='search', action_input='who is leo di caprios gf?')
  • 1
  • 2
  • 3
  • 4

结构化输出解析器

虽然Pydanti的JSON解析器更强大,但我们最初尝试的数据结构仅包含文本字段。

from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
  • 1
  • 2
  • 3
  • 4

在这里,我们定义了我们希望接收的响应模式:

response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="source", description="source used to answer the user's question, should be a website.")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
  • 1
  • 2
  • 3
  • 4
  • 5

现在我们得到一个包含响应格式化指令的字符串,然后将其插入到我们的提示中:

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

现在我们可以使用它来格式化一个提示,然后将其发送给语言模型,并解析返回的结果:

model = OpenAI(temperature=0)
_input = prompt.format_prompt(question="what's the capital of france?")
output = model(_input.to_string())
output_parser.parse(output)
{'answer': 'Paris',
'source': 'https://www.worldatlas.com/articles/what-is-the-capital-of-france.html'}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

下面是在聊天模型中使用它的一个示例:

chat_model = ChatOpenAI(temperature=0)
prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template("answer the users question as best as possible.\n{format_instructions}\n{question}")  
    ],
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)
_input = prompt.format_prompt(question="what's the capital of france?")
output = chat_model(_input.to_messages())
output_parser.parse(output.content)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

输出:

{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}
  • 1

参考文献:
[1] LangChain官方网站:https://www.langchain.com/
[2] LangChain

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/378178
推荐阅读
相关标签