赞
踩
小李哥今天将继续介绍亚马逊云科技AWS云计算平台上的前沿前沿AI技术解决方案,帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS上的AI软甲开发最佳实践,并应用到自己的日常工作里。本次介绍的是如何在Amazon SageMaker上微调(Fine-tune)大语言模型dolly-v2-3b,满足日常生活中不同的场景需求,并将介分享如何在SageMaker上优化模型性能并节省计算资源实现成本控制,最后将部署后的大语言模型URL集成到自己云上的软件应用中。
本方案包括通过Amazon Cloudfront和S3托管前端页面,并通过Amazon API Gateway和AWS Lambda将应用程序与AI模型集成,调用大模型实现推理。本方案的解决方案架构图如下:
利用本方案小李哥用微调后的模型搭建了一个Q&A对话机器人助手,可以生成代码、文字总结、回答问题。
在开始分享案例之前,我们来了解一下本方案的技术背景,帮助大家更好的理解方案架构。
Amazon SageMaker 是一个完全托管的机器学习服务(大家可以理解为Serverless的Jupyter Notebook),专为应用开发和数据科学家设计,帮助他们快速构建、训练和部署机器学习模型。使用 SageMaker,您无需担心底层基础设施的管理,可以专注于模型的开发和优化。它提供了一整套工具和功能,包括数据准备、模型训练、超参数调优、模型部署和监控,简化了整个机器学习工作流程。
下面跟着小李哥手把手微调一个亚马逊云科技AWS上的生成式AI模型(dolly-v2-3b)的软件应用,并将AI大模型部署与应用集成。
!nvidia-smi
- %%capture
-
- !pip3 install -r requirements.txt --quiet
- !pip install sagemaker --quiet --upgrade --force-reinstall
- %%capture
-
- import os
- import numpy as np
- import pandas as pd
- from typing import Any, Dict, List, Tuple, Union
- from datasets import Dataset, load_dataset, disable_caching
- disable_caching() ## disable huggingface cache
-
- from transformers import AutoModelForCausalLM
- from transformers import AutoTokenizer
- from transformers import TextDataset
-
- import torch
- from torch.utils.data import Dataset, random_split
- from transformers import TrainingArguments, Trainer
- import accelerate
- import bitsandbytes
-
- from IPython.display import Markdown
- sagemaker_faqs_dataset = load_dataset("csv",
- data_files='data/amazon_sagemaker_faqs.csv')['train']
- sagemaker_faqs_dataset
- sagemaker_faqs_dataset[0]
- from utils.helpers import INTRO_BLURB, INSTRUCTION_KEY, RESPONSE_KEY, END_KEY, RESPONSE_KEY_NL, DEFAULT_SEED, PROMPT
- '''
- PROMPT = """{intro}
- {instruction_key}
- {instruction}
- {response_key}
- {response}
- {end_key}"""
- '''
- Markdown(PROMPT)
- tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b",
- padding_side="left")
-
- tokenizer.pad_token = tokenizer.eos_token
- tokenizer.add_special_tokens({"additional_special_tokens":
- [END_KEY, INSTRUCTION_KEY, RESPONSE_KEY_NL]})
-
- model = AutoModelForCausalLM.from_pretrained(
- "databricks/dolly-v2-3b",
- # use_cache=False,
- device_map="auto", #"balanced",
- load_in_8bit=True,
- )
- model.resize_token_embeddings(len(tokenizer))
-
- from functools import partial
- from utils.helpers import mlu_preprocess_batch
-
- MAX_LENGTH = 256
- _preprocessing_function = partial(mlu_preprocess_batch, max_length=MAX_LENGTH, tokenizer=tokenizer)
-
- encoded_sagemaker_faqs_dataset = sagemaker_faqs_dataset.map(
- _preprocessing_function,
- batched=True,
- remove_columns=["instruction", "response", "text"],
- )
-
- processed_dataset = encoded_sagemaker_faqs_dataset.filter(lambda rec: len(rec["input_ids"]) < MAX_LENGTH)
-
- split_dataset = processed_dataset.train_test_split(test_size=14, seed=0)
- split_dataset
- from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, TaskType
-
- MICRO_BATCH_SIZE = 8
- BATCH_SIZE = 64
- GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
- LORA_R = 256 # 512
- LORA_ALPHA = 512 # 1024
- LORA_DROPOUT = 0.05
-
- # Define LoRA Config
- lora_config = LoraConfig(
- r=LORA_R,
- lora_alpha=LORA_ALPHA,
- lora_dropout=LORA_DROPOUT,
- bias="none",
- task_type="CAUSAL_LM"
- )
-
- model = get_peft_model(model, lora_config)
- model.print_trainable_parameters()
-
- from utils.helpers import MLUDataCollatorForCompletionOnlyLM
-
- data_collator = MLUDataCollatorForCompletionOnlyLM(
- tokenizer=tokenizer, mlm=False, return_tensors="pt", pad_to_multiple_of=8
- )
- EPOCHS = 10
- LEARNING_RATE = 1e-4
- MODEL_SAVE_FOLDER_NAME = "dolly-3b-lora"
-
- training_args = TrainingArguments(
- output_dir=MODEL_SAVE_FOLDER_NAME,
- fp16=True,
- per_device_train_batch_size=1,
- per_device_eval_batch_size=1,
- learning_rate=LEARNING_RATE,
- num_train_epochs=EPOCHS,
- logging_strategy="steps",
- logging_steps=100,
- evaluation_strategy="steps",
- eval_steps=100,
- save_strategy="steps",
- save_steps=20000,
- save_total_limit=10,
- )
-
- trainer = Trainer(
- model=model,
- tokenizer=tokenizer,
- args=training_args,
- train_dataset=split_dataset['train'],
- eval_dataset=split_dataset["test"],
- data_collator=data_collator,
- )
- model.config.use_cache = False # silence the warnings. Please re-enable for inference!
- trainer.train()
- trainer.model.save_pretrained(MODEL_SAVE_FOLDER_NAME)
-
- trainer.save_model()
-
- trainer.model.config.save_pretrained(MODEL_SAVE_FOLDER_NAME)
-
- tokenizer.save_pretrained(MODEL_SAVE_FOLDER_NAME)
对部署所需要的参数进行定义和初始化
- import boto3
- import json
- import sagemaker.djl_inference
- from sagemaker.session import Session
- from sagemaker import image_uris
- from sagemaker import Model
-
- sagemaker_session = Session()
- print("sagemaker_session: ", sagemaker_session)
-
- aws_role = sagemaker_session.get_caller_identity_arn()
- print("aws_role: ", aws_role)
-
- aws_region = boto3.Session().region_name
- print("aws_region: ", aws_region)
-
- image_uri = image_uris.retrieve(framework="djl-deepspeed",
- version="0.22.1",
- region=sagemaker_session._region_name)
- print("image_uri: ", image_uri)
进行模型部署
- model_data="s3://{}/lora_model.tar.gz".format(mybucket)
-
- model = Model(image_uri=image_uri,
- model_data=model_data,
- predictor_cls=sagemaker.djl_inference.DJLPredictor,
- role=aws_role)
- outputs = predictor.predict({"inputs": "What solutions come pre-built with Amazon SageMaker JumpStart?"})
-
- from IPython.display import Markdown
- Markdown(outputs)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。