赞
踩
首先安装: pip install transformers
这里有不同种类语言的离线模型清单:https://huggingface.co/languages
最简单的使用方式,是使用现成的pipeline,背后流程如下:
我们可以去huggingface上找模型。我们以情绪分析为例,默认的pipeline是识别英文的,如果我们要识别中文怎么办?
首先去模型库寻找合适的模型(点击左边的tasks和language可以进行筛选):
from transformers import BertForSequenceClassification
from transformers import BertTokenizer
import torch
tokenizer=BertTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment')
model=BertForSequenceClassification.from_pretrained('IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment')
text='今天心情不好'
output=model(torch.tensor([tokenizer.encode(text)]))
print(torch.nn.functional.softmax(output.logits,dim=-1))
保存模型的代码如下
pt_save_directory = "./pt_save_pretrained"
tokenizer.save_pretrained(pt_save_directory)
pt_model.save_pretrained(pt_save_directory)
预训练的模型如下:
"audio-classification"
: 语音分类
"automatic-speech-recognition"
语音识别
"conversational"
: 对话
"feature-extraction"
: 提取特征
"fill-mask"
: 填充
"image-classification"
: 图像分类
"question-answering"
: 问答
"table-question-answering"
: 表格问答
"text2text-generation"
: 文本生成
"text-classification"
(又名"sentiment-analysis"
): 文本分类
"text-generation"
: 文本生成
"token-classification"
(又名"ner"
): token分类
"translation"
: 翻译
"translation_xx_to_yy"
: 翻译
"summarization"
: 总结
"zero-shot-classification"
: 零样本分类
pipepline加载的内容包含如下:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('We are very happy to introduce pipeline to the transformers repository.')
from transformers import pipeline
question_answerer = pipeline('question-answering')
question_answerer({ 'question': 'What is the name of the repository ?', 'context': 'Pipeline has been included in the huggingface/transformers repository'})
from transformers import pipeline
import torch
from datasets import load_dataset, Audio
dataset = load_dataset("PolyAI/minds14", name="en-US", split="train")
# 对数据进行重采样
dataset = dataset. cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate))
speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
result = speech_recognizer([a['array'] for a in dataset[:4]["audio"]])
generator = pipeline(task="text-generation")
generator("Eight people were kill at party in California.")
vision_classifier = pipeline(task="image-classification")
vision_classifier(images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")
文字的话需要定义tokenizer。tokenizer负责把文字转换为一个字典,例如:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
encoding = tokenizer("We are very happy to show you the 声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/372412
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。