当前位置:   article > 正文

wsl-oracle 安装 omlutils

wsl-oracle 安装 omlutils

1. 安装 cmake 和 gcc-c++

sudo dnf install -y cmake gcc-c++
  • 1

2. 安装 omlutils

pip install omlutils-0.10.0-cp312-cp312-linux_x86_64.whl
  • 1

不需要安装 requirements.txt,特别是里面有torch==2.2.0+cpu,会卸载掉支持 GPU 的 torch。

--extra-index-url https://download.pytorch.org/whl/cpu
torch==2.2.0+cpu
  • 1
  • 2

3. 使用 omlutils 创建 onnx 模型

安装 sentencepiece

pip install sentencepiece
  • 1

修复omlutils部分代码使其支持支持大于1GB,是tokenizer是XLMRobertaTokenizer的模型。

vi /home/oracle/miniconda/envs/learn-oracle23c/lib/python3.12/site-packages/omlutils/_pipeline/steps.py

--- before
        size_threshold = quant_limit if is_quantized else 0.99e9
---

--- after
        size_threshold = quant_limit if is_quantized else 0.99e9 * 5
--- after
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
vi /home/oracle/miniconda/envs/learn-oracle23c/lib/python3.12/site-packages/omlutils/_pipeline/steps.py

--- before
    def validateBertTokenizer(self,tokenizer):
        supportedTokenizer=[transformers.models.bert.BertTokenizer, transformers.models.distilbert.DistilBertTokenizer,transformers.models.mpnet.MPNetTokenizer]
        cls=tokenizer.__class__
        if(cls not in supportedTokenizer):
            raise ValueError(f"Unsupported tokenizer {cls}")
---

--- after
    def validateBertTokenizer(self,tokenizer):
        supportedTokenizer=[transformers.models.bert.BertTokenizer, transformers.models.distilbert.DistilBertTokenizer,transformers.models.mpnet.MPNetTokenizer,transformers.models.xlm_roberta.tokenization_xlm_roberta.XLMRobertaTokenizer]
        cls=tokenizer.__class__
        if(cls not in supportedTokenizer):
            raise ValueError(f"Unsupported tokenizer {cls}")
--- after
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
vi /home/oracle/miniconda/envs/learn-oracle23c/lib/python3.12/site-packages/omlutils/_onnx_export/tokenizer_export.py

--- before
TOKENIZER_MAPPING = {
    transformers.models.bert.BertTokenizer: SupportedTokenizers.BERT,
    transformers.models.clip.CLIPTokenizer: SupportedTokenizers.CLIP,
    transformers.models.distilbert.DistilBertTokenizer: SupportedTokenizers.BERT,
    transformers.models.gpt2.GPT2Tokenizer: SupportedTokenizers.GPT2,
    #transformers.models.llama.LlamaTokenizer: SupportedTokenizers.SENTENCEPIECE,
    # transformers.models.mluke.MLukeTokenizer: SupportedTokenizers.SENTENCEPIECE,
    transformers.models.mpnet.MPNetTokenizer: SupportedTokenizers.BERT,
    # transformers.models.roberta.tokenization_roberta.RobertaTokenizer: SupportedTokenizers.ROBERTA,
    # transformers.models.xlm_roberta.XLMRobertaTokenizer: SupportedTokenizers.SENTENCEPIECE,
}
---

--- after
TOKENIZER_MAPPING = {
    transformers.models.bert.BertTokenizer: SupportedTokenizers.BERT,
    transformers.models.clip.CLIPTokenizer: SupportedTokenizers.CLIP,
    transformers.models.distilbert.DistilBertTokenizer: SupportedTokenizers.BERT,
    transformers.models.gpt2.GPT2Tokenizer: SupportedTokenizers.GPT2,
    #transformers.models.llama.LlamaTokenizer: SupportedTokenizers.SENTENCEPIECE,
    # transformers.models.mluke.MLukeTokenizer: SupportedTokenizers.SENTENCEPIECE,
    transformers.models.mpnet.MPNetTokenizer: SupportedTokenizers.BERT,
    # transformers.models.roberta.tokenization_roberta.RobertaTokenizer: SupportedTokenizers.ROBERTA,
    transformers.models.xlm_roberta.XLMRobertaTokenizer: SupportedTokenizers.SENTENCEPIECE,
}
---
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29

创建 multilingual_e5_small.py,内容如下,

from omlutils import EmbeddingModel, EmbeddingModelConfig
print(f"start...")
config = EmbeddingModelConfig.from_template("text", max_seq_length=512)
em = EmbeddingModel(model_name="intfloat/multilingual-e5-small", config=config)
em.export2file("multilingual_e5_small", output_dir=".")
print(f"complete...")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

创建 onnx 模型,

python multilingual_e5_small.py
  • 1

程序执行完成后,会创建一个 multilingual_e5_small.onnx 文件。

(可选)升级transformers,

pip install -U transformers
  • 1

完结!

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/246202
推荐阅读
相关标签
  

闽ICP备14008679号