当前位置:   article > 正文

从零开发短视频电商 在AWS上用SageMaker部署自定义模型_怎么用huggingface上的模型放在aws sagemaker上

怎么用huggingface上的模型放在aws sagemaker上

简介

部署的都是从huggingface上的model或者根据huaggingface上的model进行fine-tune后的

一般输入格式如下:

text-classification request body

{
    "inputs": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days."
}
question-answering request body

{
    "inputs": {
        "question": "What is used for inference?",
        "context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
    }
}
zero-shot classification request body

{
    "inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!",
    "parameters": {
        "candidate_labels": [
            "refund",
            "legal",
            "faq"
        ]
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25

所有官方示例

  • https://github.com/huggingface/notebooks/tree/main/sagemaker

推理工具

  • https://github.com/aws/sagemaker-huggingface-inference-toolkit

使用model.tar.gz

1.从huggingface上下载模型

由于模型文件比较大,需要先安装git-lfs

CentOS7安装Git LFS的方法如下:

# 安装必要的软件包:
sudo yum install curl-devel expat-devel gettext-devel openssl-devel zlib-devel
# 安装Git LFS:
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.rpm.sh | sudo bash
# 安装
sudo yum install git-lfs
# 配置Git LFS:
git lfs install
# 检测是否安装成功:
git lfs version
如果出现版本信息,说明安装成功。
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

从huaggingface上clone你想使用的模型,以https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 为例子

git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
  • 1

在这里插入图片描述

2.自定义代码

允许用户覆盖 HuggingFaceHandlerService 的默认方法。您需要创建一个名为 code/文件夹,其中包含 inference.py 文件。

目录结构如下

model.tar.gz/
|- pytorch_model.bin
|- ....
|- code/
  |- inference.py
  |- requirements.txt 
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

inference.py 文件包含自定义推理模块, requirements.txt 文件包含应添加的其他依赖项。自定义模块可以重写以下方法:

  • model_fn(model_dir) 覆盖加载模型的默认方法。返回值 model 将在 predict 中用于预测。 predict 接收参数 model_dir ,即解压后的 model.tar.gz 的路径。
  • transform_fn(model, data, content_type, accept_type) 使用您的自定义实现覆盖默认转换函数。您需要在 transform_fn 中实现您自己的 preprocesspredictpostprocess 步骤。此方法不能与下面提到的 input_fnpredict_fnoutput_fn 组合使用。
  • input_fn(input_data, content_type) 覆盖默认的预处理方法。返回值 data 将在 predict 中用于预测。输入是:
    • input_data 是您请求的原始正文。
    • content_type 是请求标头中的内容类型。
  • predict_fn(processed_data, model) 覆盖默认的预测方法。返回值 predictions 将在 postprocess 中使用。输入是 processed_data ,即 preprocess 的结果。
  • output_fn(prediction, accept) 覆盖后处理的默认方法。返回值 result 将是您请求的响应(例如 JSON )。输入是:
    • predictionspredict 的结果。
    • accept 是 HTTP 请求的返回接受类型,例如 application/json

以下是包含 model_fninput_fnpredict_fnoutput_fn 的自定义推理模块的示例:

from sagemaker_huggingface_inference_toolkit import decoder_encoder

def model_fn(model_dir):
    # implement custom code to load the model
    loaded_model = ...
    
    return loaded_model 

def input_fn(input_data, content_type):
    # decode the input data  (e.g. JSON string -> dict)
    data = decoder_encoder.decode(input_data, content_type)
    return data

def predict_fn(data, model):
    # call your custom model with the data
    outputs = model(data , ... )
    return predictions

def output_fn(prediction, accept):
    # convert the model output to the desired output format (e.g. dict -> JSON string)
    response = decoder_encoder.encode(prediction, accept)
    return response
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

仅使用 model_fntransform_fn 自定义推理模块:

from sagemaker_huggingface_inference_toolkit import decoder_encoder

def model_fn(model_dir):
    # implement custom code to load the model
    loaded_model = ...
    
    return loaded_model 

def transform_fn(model, input_data, content_type, accept):
     # decode the input data (e.g. JSON string -> dict)
    data = decoder_encoder.decode(input_data, content_type)

    # call your custom model with the data
    outputs = model(data , ... ) 

    # convert the model output to the desired output format (e.g. dict -> JSON string)
    response = decoder_encoder.encode(output, accept)

    return response
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

重点,这里的话我们 all-MiniLM-L6-v2的示例代码如下:

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

# Normalize embeddings
sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)

print("Sentence embeddings:")
print(sentence_embeddings)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

我们需要改造下,改为我们自己需要的自定义代码:

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# 这个方法直接同上
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

# 覆盖 -- 模型加载 参考all-MiniLM-L6-v2给出的示例代码
def model_fn(model_dir):
  # Load model from HuggingFace Hub
  tokenizer = AutoTokenizer.from_pretrained(model_dir)
  model = AutoModel.from_pretrained(model_dir)
  return model, tokenizer
# 覆盖 -- 预测方法 参考all-MiniLM-L6-v2给出的示例代码
def predict_fn(data, model_and_tokenizer):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    
    # Tokenize sentences
    sentences = data.pop("inputs", data)
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
    
    # return dictonary, which will be json serializable
    return {"vectors": sentence_embeddings[0].tolist()}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

3.打包为tar 文件

cd all-MiniLM-L6-v2
tar zcvf model.tar.gz *
  • 1
  • 2

4.上传model.tar.gz到S3

5.部署推理

这里有好几种方式可选。

第一种:在jupyterlab执行这个脚本,替换model等参数即可。

  • https://github.com/huggingface/notebooks/blob/main/sagemaker/10_deploy_model_from_s3/deploy_transformer_model_from_s3.ipynb

第二种:这个是吧上面所有步骤都包含了,但是这种无法处理我们在私有环境fine-tune后的模型。

  • https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb

第三种:可视化部署,我重点介绍下这个吧

入口如下:

注意下面的选项

  • 容器框架根据实际情况选择,这里我们就选择如图
  • S3 URI
  • IAM role:
    • 可以去IAM创建角色
      • AmazonS3FullAccess
      • AmazonSageMakerFullAccess
    • 也可以去JumpStart中的model去复制过来。

使用hub

原文:https://huggingface.co/docs/sagemaker/inference#deploy-a-model-from-the–hub

这种方式没有上面的方式灵活度高,支持的model也没有上面的方式多。

1.在sagemaker上新建个jupyterlab

2.上传官方示例ipynb文件

3.指定HF_MODEL_ID和HF_TASK进行部署和推理

inference.py官方示例

  • https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/pytorch_bert/deploy_bert_outputs.html#Write-the-Inference-Script
import os
import json
from transformers import BertTokenizer, BertModel

def model_fn(model_dir):
    """
    Load the model for inference
    """

    model_path = os.path.join(model_dir, 'model/')

    # Load BERT tokenizer from disk.
    tokenizer = BertTokenizer.from_pretrained(model_path)

    # Load BERT model from disk.
    model = BertModel.from_pretrained(model_path)

    model_dict = {'model': model, 'tokenizer':tokenizer}

    return model_dict

def predict_fn(input_data, model):
    """
    Apply model to the incoming request
    """

    tokenizer = model['tokenizer']
    bert_model = model['model']

    encoded_input = tokenizer(input_data, return_tensors='pt')

    return bert_model(**encoded_input)

def input_fn(request_body, request_content_type):
    """
    Deserialize and prepare the prediction input
    """

    if request_content_type == "application/json":
        request = json.loads(request_body)
    else:
        request = request_body

    return request

def output_fn(prediction, response_content_type):
    """
    Serialize and prepare the prediction output
    """

    if response_content_type == "application/json":
        response = str(prediction)
    else:
        response = str(prediction)

    return response
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小桥流水78/article/detail/844621
推荐阅读
相关标签
  

闽ICP备14008679号