赞
踩
本部分是Transformer库的基础部分的上半部分,主要包括任务汇总、模型汇总和数据预处理三方面内容,由于许多模型我也不太了解,所以多为机器翻译得到,错误再所难免,内容仅供参考。
该部分介绍了使用该库时最常见的用例。可用的模型允许许多不同的配置,并且在各个实际用例中具有很大的通用性。可用的模型允许许多不同的配置,并且在用例中具有很大的通用性。
这些例子利用了auto-models,这些类将根据给定的checkpoint实例化一个模型,自动地选择正确的模型架构。
为了让模型在任务上良好地执行,它必须从与该任务对应的checkpoint加载。这些checkpoint通常针对大量数据进行预先训练,并针对特定任务进行微调。
这意味着以下内容
为了对任务进行推理,这个库提供了几种机制:
下面的具体应用中展示了这两种方法。
序列分类是根据给定的类别数目对序列进行分类的任务。序列分类的一个例子是GLUE数据集,它完全基于该任务。如果想在GLUE序列分类任务上对模型进行微调,可以利用run_glue.py、run_tf_glue.py、run_tf_text_classification.py或run_xnlib .py脚本。
下面是一个使用Pipeline进行情感分析的例子:识别一个序列是积极的还是消极的。它在sst2上利用了一个经过微调的模型,这是一个GLUE任务。
这将在分数旁边返回一个标签(正的或负的),如下所示
from transformers import pipeline
nlp = pipeline("sentiment-analysis")
result = nlp("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
result = nlp("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
输出为
label: NEGATIVE, with score: 0.9991
label: POSITIVE, with score: 0.9999
下面是一个使用模型进行序列分类的例子,以确定两个序列是否相互转述(paraphrase)。
过程如下:
相关代码如下:
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc") classes = ["not paraphrase", "is paraphrase"] sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "Apples are especially bad for your health" sequence_2 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt") not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt") paraphrase_classification_logits = model(**paraphrase).logits not_paraphrase_classification_logits = model(**not_paraphrase).logits paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0] not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0] # Should be paraphrase for i in range(len(classes)): print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%") print("---"*24) # Should not be paraphrase for i in range(len(classes)): print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")
得到输出如下
not paraphrase: 10%
paraphrase: 90%
------------------------------------------------------------------------
not paraphrase: 94%
paraphrase: 6%
提取式问答是指从给定的文本中提取一个答案的任务。关于QA数据集的一个例子是SQuAD数据集,它完全基于该任务。如果想对一个任务的模型在SQuAD数据集上进行微调,可以使用run_qa.py和run_tf_team .py脚本。
下面是一个使用pipeline进行问题回答的示例:从给定问题的文本中提取答案。它利用了一个在SQuAD上微调过的模型。
from transformers import pipeline
nlp = pipeline("question-answering")
context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
"""
result = nlp(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
print("---"*24)
result = nlp(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
输出如下
Answer: 'the task of extracting an answer from a text given a question', score: 0.6226, start: 34, end: 95
------------------------------------------------------------------------
Answer: 'SQuAD dataset', score: 0.5053, start: 147, end: 160
下面是一个使用模型和分词器回答问题的示例。流程如下:
相关代码如下:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
text = r"""
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/345517
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。