当前位置:   article > 正文

Transformers学习笔记3. HuggingFace管道函数Pipeline_transformers pipeline 批量推理

transformers pipeline 批量推理

一、简介

Hugging face提供了管道函数——Pipeline,可以使用极少的代码快速开启一个NLP任务。

Pipeline 具备了数据预处理、模型处理、模型输出后处理等步骤,可以直接输入原始数据,然后给出预测结果,十分方便。

给定一个任务之后,pipeline会自动调用一个预训练好的模型,然后根据你给的输入执行下面三个步骤:

  1. 预处理输入文本,让它可被模型读取
  2. 模型处理
  3. 模型输出的后处理,让预测结果可读

虽然Pipeline使用很简单,但对于专业人士缺乏灵活性。

当前在下面网址查到当前有效的Pipeline:
https://huggingface.co/docs/transformers/main_classes/pipelines

本文介绍其中一些管道模型的使用。

二、一些管道模型示例

1. 情感分析

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I am happy.")
  • 1
  • 2
  • 3
  • 4

输出:

[{'label': 'POSITIVE', 'score': 0.9998760223388672}]
  • 1

也可以传列表作为参数。

2. 零样本文本分类

from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    ["This is a course about the Transformers library",
        "New policy mix to propel turnaround in China's economy"],
    candidate_labels=["education", "politics", "business"],
)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

[{'sequence': 'This is a course about the Transformers library', 'labels': ['education', 'business', 'politics'], 'scores': [0.8445969820022583, 0.11197575181722641, 0.0434272475540638]}, 
{'sequence': "New policy mix to propel turnaround in China's economy", 'labels': ['business', 'politics', 'education'], 'scores': [0.6015452146530151, 0.348330557346344, 0.05012420192360878]}]

  • 1
  • 2
  • 3
  • 4

3. 实体名称识别

from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
print(ner("My name is Sylvain and I work at Hugging Face in Brooklyn."))
  • 1
  • 2
  • 3
  • 4

输出:

[{'entity_group': 'PER', 'score': 0.9981694, 'word': 'Sylvain', 'start': 11, 'end': 18}, 
{'entity_group': 'ORG', 'score': 0.9796019, 'word': 'Hugging Face', 'start': 33, 'end': 45}, 
{'entity_group': 'LOC', 'score': 0.9932106, 'word': 'Brooklyn', 'start': 49, 'end': 57}]

  • 1
  • 2
  • 3
  • 4

4. 摘要

from transformers import pipeline

# use bart in pytorch
summarizer = pipeline("summarization")
summarizer("Sam Shleifer writes the best docstring examples in the whole world.", min_length=5, max_length=8)


  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

输出:

# max_length=8
[{'summary_text': ' Sam Shleifer writes'}]
# max_length=12
[{'summary_text': ' Sam Shleifer writes the best docstring'}]
  • 1
  • 2
  • 3
  • 4

5. 文本生成

from transformers import pipeline

generator = pipeline('text-generation', model='liam168/chat-DialoGPT-small-zh')
print(generator('今天早上早点到公司,', max_length=100))
  • 1
  • 2
  • 3
  • 4

6. GPT2英文文本生成

from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
print(generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

结果:

[{'generated_text': 'In this course, we will teach you how to write a powerful and useful resource for your students.\n\n\n\nHow can you understand your own'}, 
{'generated_text': 'In this course, we will teach you how to program a \u202an–n\u202f\u202f\u202f and learn how to use it.'}]
  • 1
  • 2

7. 遮挡字还原

from transformers import pipeline

unmasker = pipeline('fill-mask')

print(unmasker('What the <mask>?', top_k=3))
  • 1
  • 2
  • 3
  • 4
  • 5

结果:

[{'score': 0.378376841545105, 'token': 17835, 'token_str': ' heck', 'sequence': 'What the heck?'}, 
{'score': 0.32931089401245117, 'token': 7105, 'token_str': ' hell', 'sequence': 'What the hell?'}, 
{'score': 0.1464540809392929, 'token': 26536, 'token_str': ' fuck', 'sequence': 'What the fuck?'}]

  • 1
  • 2
  • 3
  • 4

8. 问答

from transformers import pipeline

question_answerer = pipeline("question-answering")
print(question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
))
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

输出:

{'score': 0.6949763894081116, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

  • 1
  • 2

9. 翻译

一个中文到广东话翻译器:

from transformers import pipeline

translator = pipeline("translation", model="botisan-ai/mt5-translate-zh-yue")
print(translator("今天吃早饭没有?"))
  • 1
  • 2
  • 3
  • 4

输出:

[{'translation_text': '今日食早飯未?'}]
  • 1
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Gausst松鼠会/article/detail/353657
推荐阅读
相关标签
  

闽ICP备14008679号