当前位置:   article > 正文

实用篇 | huggingface的一些应用指导

huggingface

 本文主要介绍hugging Face(拥抱脸)的简单介绍以及常见用法,用来模型测试是个好的工具~

 如下图所示左边框是各项任务,包含多模态(Multimodal),计算机视觉(Computer Vision),自然语言处理(NLP)等,右边是各任务模型。

本文测试主要有

目录

示例1:语音识别

1.1.语音识别

1.2.语音情绪识别

示例2:多模态中的vit模型提取图片特征

示例3:自然语言处理(NLP)中的翻译实现

问题与解决


示例1:语音识别

点击任务栏里的 Audio Classification(语音分类)任务

1.1.语音识别

点开一个模型,是基于语音的情绪识别,有的作者可能有写模型描述(Model description),任务和数据集描述(Task and dataset description),使用示例(Usage examples)有的作者不会写的很详细。

 虽然是不同用户的模型,但是实现都是语音情绪识别,相同hubert模型,训练的数据集不同~

 使用案例

git clone https://github.com/m3hrdadfi/soxan.git
  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. import torchaudio
  5. from transformers import AutoConfig, Wav2Vec2FeatureExtractor
  6. import librosa
  7. import IPython.display as ipd
  8. import numpy as np
  9. import pandas as pd
  10. from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification
  11. device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
  12. model_name_or_path = "harshit345/xlsr-wav2vec-speech-emotion-recognition"
  13. config = AutoConfig.from_pretrained(model_name_or_path)
  14. feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
  15. sampling_rate = feature_extractor.sampling_rate
  16. model = Wav2Vec2ForSpeechClassification.from_pretrained(model_name_or_path).to(device)
  17. def speech_file_to_array_fn(path, sampling_rate):
  18. speech_array, _sampling_rate = torchaudio.load(path)
  19. resampler = torchaudio.transforms.Resample(_sampling_rate)
  20. speech = resampler(speech_array).squeeze().numpy()
  21. return speech
  22. def predict(path, sampling_rate):
  23. speech = speech_file_to_array_fn(path, sampling_rate)
  24. inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
  25. inputs = {key: inputs[key].to(device) for key in inputs}
  26. with torch.no_grad():
  27. logits = model(**inputs).logits
  28. scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
  29. outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in enumerate(scores)]
  30. return outputs
  31. # path for a sample
  32. path = '/workspace/tts_dataset/studio/studio_audio/studio_a_00000.wav'
  33. outputs = predict(path, sampling_rate)
  34. print(outputs)

 更改代码中的.wav地址

要运行前必须安装的是huggingface的库,所以之前没有的话安装(建议python3.8以上)

  1. # requirement packages
  2. pip install git+https://github.com/huggingface/datasets.git
  3. pip install git+https://github.com/huggingface/transformers.git
  4. # 语音处理需要以下俩个库,任务不同,所安装的库不同
  5. pip install torchaudio
  6. pip install librosa

1.2.语音情绪识别

 

 原文提供了预测方法,直接是实现不了的,需要修改

  1. def predict_emotion_hubert(audio_file):
  2. """ inspired by an example from https://github.com/m3hrdadfi/soxan """
  3. from audio_models import HubertForSpeechClassification
  4. from transformers import Wav2Vec2FeatureExtractor, AutoConfig
  5. import torch.nn.functional as F
  6. import torch
  7. import numpy as np
  8. from pydub import AudioSegment
  9. model = HubertForSpeechClassification.from_pretrained("Rajaram1996/Hubert_emotion") # Downloading: 362M
  10. feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/hubert-base-ls960")
  11. sampling_rate=16000 # defined by the model; must convert mp3 to this rate.
  12. config = AutoConfig.from_pretrained("Rajaram1996/Hubert_emotion")
  13. def speech_file_to_array(path, sampling_rate):
  14. # using torchaudio...
  15. # speech_array, _sampling_rate = torchaudio.load(path)
  16. # resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
  17. # speech = resampler(speech_array).squeeze().numpy()
  18. sound = AudioSegment.from_file(path)
  19. sound = sound.set_frame_rate(sampling_rate)
  20. sound_array = np.array(sound.get_array_of_samples())
  21. return sound_array
  22. sound_array = speech_file_to_array(audio_file, sampling_rate)
  23. inputs = feature_extractor(sound_array, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
  24. inputs = {key: inputs[key].to("cpu").float() for key in inputs}
  25. with torch.no_grad():
  26. logits = model(**inputs).logits
  27. scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
  28. outputs = [{
  29. "emo": config.id2label[i],
  30. "score": round(score * 100, 1)}
  31. for i, score in enumerate(scores)
  32. ]
  33. return [row for row in sorted(outputs, key=lambda x:x["score"], reverse=True) if row['score'] != '0.0%'][:2]

 当然可以不按作者的写

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. import torchaudio
  5. from transformers import AutoConfig, Wav2Vec2FeatureExtractor
  6. from src.models import Wav2Vec2ForSpeechClassification, HubertForSpeechClassification
  7. #dataset: AVDESS
  8. model_name_or_path = "Rajaram1996/Hubert_emotion"
  9. device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
  10. config = AutoConfig.from_pretrained(model_name_or_path)
  11. feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name_or_path)
  12. sampling_rate = feature_extractor.sampling_rate
  13. # for wav2vec
  14. #model = Wav2Vec2ForSpeechClassification.from_pretrained(model_name_or_path).to(device)
  15. # for hubert
  16. model = HubertForSpeechClassification.from_pretrained(model_name_or_path).to(device)
  17. def speech_file_to_array_fn(path, sampling_rate):
  18. speech_array, _sampling_rate = torchaudio.load(path)
  19. resampler = torchaudio.transforms.Resample(_sampling_rate, sampling_rate)
  20. speech = resampler(speech_array).squeeze().numpy()
  21. return speech
  22. def predict(path, sampling_rate):
  23. speech = speech_file_to_array_fn(path, sampling_rate)
  24. inputs = feature_extractor(speech, sampling_rate=sampling_rate, return_tensors="pt", padding=True)
  25. inputs = {key: inputs[key].to(device) for key in inputs}
  26. with torch.no_grad():
  27. logits = model(**inputs).logits
  28. scores = F.softmax(logits, dim=1).detach().cpu().numpy()[0]
  29. outputs = [{"Emotion": config.id2label[i], "Score": f"{round(score * 100, 3):.1f}%"} for i, score in
  30. enumerate(scores)]
  31. if outputs["Score"]
  32. return outputs
  33. # 001.wav =hap
  34. path = "/workspace/dataset/emo/audio/000-001.wav"
  35. outputs = predict(path, sampling_rate)
  36. print(outputs)

1.3.语音合成

在huggingface上新建一个空间(Space),

上传文件,并运行

如果出错请参考【】

示例2:多模态中的vit模型提取图片特征

 点进模型后,复制代码(如果想查看输出,还需要自己去进行print)

  1. from transformers import ViTFeatureExtractor, ViTModel
  2. from PIL import Image
  3. import requests
  4. url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
  5. image = Image.open(requests.get(url, stream=True).raw)
  6. feature_extractor = ViTFeatureExtractor.from_pretrained('facebook/dino-vitb16')
  7. model = ViTModel.from_pretrained('facebook/dino-vitb16')
  8. inputs = feature_extractor(images=image, return_tensors="pt")
  9. outputs = model(**inputs)
  10. last_hidden_states = outputs.last_hidden_state
  11. print(outputs,last_hidden_states)

结果

代码会下载其中的文件和打包好的模型,也就是下图这些文件

*注意:这里要求库,安装命令如下

  1. # requirement packages
  2. datasets
  3. transformers
  4. pillow

示例3:自然语言处理(NLP)中的翻译实现

选一个 翻译任务(Translation),再选择一个中英翻译模型(一般模型名称都是模型,任务等的简写),例如下方右边的模型就表示Helsinki-NLP提供的数据集为opus,mt(machine translation机器翻译,中文翻英文):

  1. from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
  2. tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
  3. model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")

 提供的代码只是如何下载模型的代码,如果还要简单测试的话,需要自己编写代码进行测试

  1. from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
  2. tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
  3. model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-zh-en")
  4. # Elena
  5. translator = pipeline("translation", model=model, tokenizer=tokenizer)
  6. results = translator("我是一名研究员")
  7. print(results)

 结果如图

 *注意:这里要求库,安装命令如下

  1. #必须安装包
  2. pip install transformers
  3. pip install sentencepiece
  4. #建议安装
  5. pip install sacremoses

问题与解决

【PS1】windows系统,anaconda虚拟环境中,pip安装l出错

 Could not build wheels for soxr which use PEP 517 and cannot be installed directly

  1. python -m pip install --upgrade pip
  2. python -m pip install --upgrade setuptools

 更新后如图

 ERROR: Could not build wheels for soxr, which is required to install pyproject.toml-based projects

安装对应的whl文件

下载并安装对应的whl文件,可以通过以下地址下载。
Unofficial Windows Binaries for Python Extension Packages

获得whl文件后,直接 pip install 【whl文件绝对路径】
然后就可以装成功了。

【PS2】运行app.py后出现Build error Build failed with exit code: 1

参考Receiving this error even with a new space - Spaces - Hugging Face Forums 

修改README.md,将python版本改为3.9.13

如果依旧出现

 ERROR: process "/bin/sh -c pip install --no-cache-dir -r requirements.txt" did not complete successfully: exit code: 1

 那么参考python - Getting "The command '/bin/sh -c pip install --no-cache-dir -r requirements.txt' returned a non-zero code: 1" while building an image from dockerfile - Stack Overflow

需要设置

  • libpq-dev
  • gcc
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/264177
推荐阅读
相关标签
  

闽ICP备14008679号