赞
踩
★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>
水稻是世界上三大粮食作物之一,其起源于中国。根据考古学的研究,水稻在中国的种植历史可以追溯至公元前6000年左右。水稻以其高产、高效、适应性广等特点,成为亚洲和非洲的主要粮食作物,许多人甚至将其视为生命之源。因此,水稻品种本身的优良性至关重要。水稻品种有着不同的质量评价标准,包括外观、烹饪香气以及味道等要求。传统的人工检测方法通常是根据物理评价标准去进行分类,成本昂贵、效率低下且存在不可靠性。近年来,随着机器视觉系统和图像处理技术快速发展,可以从颜色、质地、大小等许多物理特征方面进行自动检测。但在如何提取好的特征方面也存在一定的难度,需要丰富的经验选择合适的特征进行分类。针对这些情况,选择深度学习方法进行水稻品种分类具有很大的应用前景。本文主要是使用ResNet50_vd模型对水稻分类,由环境配置、数据处理、模型配置、模型训练与优化、模型导出与推理为主要部分。其次为了研究Transformer模型在分类问题中的应用,大致讲解了一下如何搭建Transformer模型以及实现。
为了快速高效的对水稻品种分类,我们首先选择使用飞桨提供的PaddleClas套件。PaddleClas不仅提供了快速构建模型的能力,还包含了数据增强、数据预处理、模型训练、模型评估等完整的模型工作流程。用户可以根据自己的需求灵活配置各种参数,快速进行模型训练和调优。
PaddleClas是一个非常强大的图像分类套件,非常适合初学者和专业人士使用,其具有以下优点:
1.用户友好,易于上手和调试;
2.具有高效的计算性能和灵活的模型配置;
3.支持常见的图像分类算法和模型,满足不同应用场景的需求;
4.可扩展性强,可以自定义模型和指标。
PaddleClas地址:github
PaddleClas文档:使用文档
与此同时,为了更好的帮助理解Transformer模型,同时利用飞桨API搭建分类框架,包括参数配置、数据加载、模型组网、训练与优化等部分。Transformer 是 Vaswani 等人在 2017 年提出的一种基于自注意力机制(self-attention)的神经网络模型,让神经网络在机器翻译等自然语言处理任务上一举超越了传统的基于循环神经网络的方法。它是目前在自然语言处理领域中最重要的模型之一,也是 GPT、BERT、Transformer-XL 等知名预训练模型背后的核心算法。
Transformer主要有两部分:编码器和解码器。编码器和解码器都由多个相同的层组成,每个层内部有多头自注意力机制(Multi-Head Attention)和前馈神经网络(Feed-Forward Neural Network)。编码器和解码器的架构都是类似的,只是在解码器中加入了一个额外的自注意力机制,用于在生成输出时查看输入的上下文信息。
具体来说,自注意力机制将输入序列分别映射为查询、键和值,然后通过查询和键之间的相似度计算得到权重,再通过权重和值计算得到输出。其中,相似度计算使用点积操作,通过缩放因子和softmax 函数将相似度转换为权重。通过这样的方式,自注意力机制能够学习输入序列内部的相互依赖关系,以及每个位置在不同上下文中的重要性。Transformer 的优点在于充分考虑了输入序列内部的相互依赖关系,能够对序列的全局结构进行建模。通过多头自注意力机制可以学习不同层次、不同方面的上下文信息,提高了模型的表现力
Transformer目前不仅应用在自然语言处理领域,还广泛用于图像分类、目标检测与图像分割领域,可参考PaddleViT学习,此处仅以图像分类为例。Transformer分类模型的整体结构如下所示:
从上图可以看出,Transformer的核心部分在于编码器中的Attention部分。将图像转换成与词向量类似的tokens作为网络输入,通过多层编码器进行关键信息的提取,最后用于softmax分类器进行输出,即可得到想要的分类结果。具体的细节在下面一一叙述。
Python >= 3.6
PaddlePaddle >= 2.1
PaddleClas
注意:以下内容均会使用到linux常见操作指令,如ls、cd、clone等,部分指令如下:
ls:列出当前目录的文件和文件夹。
cd:切换目录。
mkdir:创建目录。
cp:复制文件或目录。
mv:移动或重命名文件或目录。
rm:删除文件或目录。
unzip:解压缩文件。
其他操作指令可参考网址:linux菜鸟教程
# 选择work目录
%cd /home/aistudio/work
# 将PaddleClas clone 到本地
!git clone https://github.com/PaddlePaddle/PaddleClas.git
# 切换到PaddleClas目录
%cd /home/aistudio/work/PaddleClas
# 安装项目额外依赖库
!pip install -r requirements.txt
!python setup.py install
本文采用的数据集来源于kaggle网站中的Rice Image Dataset,数据集压缩文件220M左右,一共分为5种水稻图像数据:Arborio, Basmati, Ipsala, Jasmine, Karacadag。
每种水稻数据集各有15000张图片,共有75000张图片。本项目创建时数据集已添加,直接解压即可。
##查看当前挂载的数据集目录,并解压。
%cd /home/aistudio/data/
!unzip data199018/Rice_Image_Dataset.zip
##第二种方式,可通过wget指令下载数据集并解压。
#%cd /home/aistudio/data/
#!wget https://www.muratkoklu.com/datasets/Rice_Image_Dataset.zip
#!unzip Rice_Image_Dataset.zip
解压完后的数据位于/data/Rice_Image_Dataset中,包含5个文件夹,分别对应五种水稻品种的数据。
为了展示不同水稻图片,在这里对每种水稻各选取一张图进行显示。
##导入模块 import cv2,os import matplotlib.pyplot as plt import warnings ##有时会出现警告信息,选择忽略 warnings.filterwarnings("ignore") ##使用交互式指令,plt.show()可省略 %matplotlib inline path = "/home/aistudio/data/Rice_Image_Dataset/" data_list = os.listdir(path) #print(data_list) index = 1 plt.figure(figsize=(12,3)) for cur_dir in data_list: if not cur_dir.endswith(".txt"): for data in os.listdir(os.path.join(path,cur_dir)): img = cv2.imread(os.path.join(path,cur_dir,data)) #print(img.shape) plt.subplot(1,5,index) index += 1 plt.title(cur_dir) plt.imshow(img) break
数据准备完成后,需要对数据格式按照如下结构组织数据,其中train_list.txt 和val_list.txt的格式形如下:
#每一行采用"空格"分隔图像路径与标注
Arborio(1000).jpg 0
前面为图像名,后面为图像标签,一般从0开始。我们将数据集按训练集与验证集的比例成7:3进行划分,并生成train_list.txt 和val_list.txt,以及对应的train_image和val_image文件夹,新生成的文件路为/home/aistudio/data/。
标签文件中的内容为:
0 Arborio
1 Basmati
2 Ipsala
3 Jasmine
4 Karacadag
import os import numpy as np from PIL import Image import io import re import shutil root_path = "/home/aistudio/data/Rice_Image_Dataset/" data_list = os.listdir(root_path) save_path = "/home/aistudio/data/" #创建文件夹 train = "/home/aistudio/data/train/" val = "/home/aistudio/data/val/" if not os.path.exists(train): os.makedirs(train) if not os.path.exists(val): os.makedirs(val) ##训练集与验证集比例 train_ratio = 0.7 ##文件写入 train_file = open(os.path.join(save_path, 'train_list.txt'), 'w', encoding='utf-8') val_file = open(os.path.join(save_path, 'val_list.txt'), 'w', encoding='utf-8') label_dict = os.path.join(save_path, 'label_list.txt') ##统计数量 a = 0 b = 0 with open(label_dict,"w") as label_list: label_id = 0 for i, path in enumerate(sorted(data_list)): if not path.endswith(".txt") and "_" not in path: label_list.write("{0} {1}\n".format(label_id,path)) image_path = os.listdir(os.path.join(root_path, path)) n = len(image_path) # n = 15000 for index, img in enumerate(image_path): try: #img此时为相对路径 img_f = os.path.join(root_path,path,img) ##判断图片是否OK,如果OK就保存 with open(img_f,"rb") as img_file: save_img = Image.open(io.BytesIO(img_file.read())) if index < int(n * train_ratio): shutil.copyfile(os.path.join(root_path,path,img),os.path.join(train,img)) ##去掉图片名称中的空格以及括号,重新命名图片 new_img = re.sub(r"[() ]","",img) os.rename(os.path.join(train,img),os.path.join(train,new_img)) train_file.write("{0} {1}\n".format(os.path.join(train,new_img),label_id)) a += 1 else: shutil.copyfile(os.path.join(root_path,path,img),os.path.join(val,img)) new_img = re.sub(r"[() ]","",img) os.rename(os.path.join(val,img),os.path.join(val,new_img)) val_file.write("{0} {1}\n".format(os.path.join(val,new_img),label_id)) b += 1 except: continue label_id += 1 train_file.close() val_file.close() print(a) print(b)
##查看数据
!tree -L 1 /home/aistudio/data
!head -20 /home/aistudio/data/train_list.txt
PaddleClas提供了丰富的模型库,多达29个系列,同时也提供了134个模型在ImageNet1k数据集上的训练配置以及预训练模型。在数据预处理上,提供了8种数据增广方式,可更加便捷地进行数据增广扩充,提升模型的鲁棒性。我们只需要根据选择的模型,修改configs文件夹中对应的yaml配置文件即可进行训练。在这里我们选择使用模型库中的ResNet50_vd模型,配置文件位于…/configs/ImageNet/ResNet/ResNet50_vd.yaml。参数详细说明请点击模型配置说明
ResNet50_vd.yaml配置信息:
Global: checkpoints: null pretrained_model: null output_dir: ./output/ device: gpu ##默认使用GPU,根据情况可选择cpu save_interval: 1 ##保存轮数,每一轮保存一次 eval_during_train: True ##训练同时开启验证 eval_interval: 1 epochs: 10 ##迭代轮数 print_batch_step: 100 ##每100iter 打印信息 use_visualdl: True # used for static mode and model export image_shape: [3, 224, 224] save_inference_dir: ./inference # model architecture Arch: name: ResNet50_vd class_num: 5 #类别数 # loss function config for traing/eval process Loss: Train: - CELoss: weight: 1.0 epsilon: 0.1 Eval: - CELoss: weight: 1.0 Optimizer: name: Momentum ##优化器 momentum: 0.9 lr: name: CosineWarmup learning_rate: 0.1 warmup_epoch:10 ##学习率预热 regularizer: name: 'L2' coeff: 0.00007 # data loader for train and eval DataLoader: Train: dataset: name: ImageNetDataset image_root: ./ ##数据集根目录 cls_label_path: /home/aistudio/data/train_list.txt ##此处为相对路径 transform_ops: - DecodeImage: to_rgb: True channel_first: False - RandCropImage: size: 224 - RandFlipImage: flip_code: 1 - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' batch_transform_ops: - MixupOperator: alpha: 0.2 sampler: name: DistributedBatchSampler batch_size: 32 drop_last: False shuffle: True loader: num_workers: 4 use_shared_memory: True Eval: dataset: name: ImageNetDataset image_root: ./ cls_label_path: /home/aistudio/data/val_list.txt transform_ops: - DecodeImage: to_rgb: True channel_first: False - ResizeImage: resize_short: 256 - CropImage: size: 224 - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' sampler: name: DistributedBatchSampler batch_size: 32 drop_last: False shuffle: True loader: num_workers: 4 use_shared_memory: True Infer: infer_imgs: docs/images/inference_deployment/whl_demo.jpg batch_size: 1 transforms: - DecodeImage: to_rgb: True channel_first: False - ResizeImage: resize_short: 256 - CropImage: size: 224 - NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' - ToCHWImage: PostProcess: name: Topk topk: 1 class_id_map_file: data/label_list.txt Metric: Train: Eval: - TopkAcc: topk: [1, 5]
修改完配置文件后,运行以下代码开始同时训练与评估。评估精度最终可达到0.99853
# 切换到PaddleClas目录
%cd /home/aistudio/work/PaddleClas
!python3 tools/train.py --config /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml
此部分主要针对Transformer模型分类,进行参数配置、数据加载、网络定义、模型训练与优化的说明与实现。参数配置包括数据加载与保存参数、训练参数、模型参数以及优化器参数,可根据自己需求进行自定义修改。日志文件与统计类用于记录模型的运行信息,详细情况如下。
import logging,time train_parameters = { ##数据部分 "image_size": 224, "input_channels": 3, "resize_short": 256, "mean_rgb": [0.485, 0.456, 0.406], # 均值 "std_rgb": [0.229, 0.224, 0.225], # 标准差 "data_dir": r"/home/aistudio/data/", # 训练数据存储地址 "train_file_list": "train.txt", "val_file_list": "val.txt", "label_file": "label_list.txt", ##训练配置 "batch_size": 16, "save_path": "./freeze-model", "save_freq": 10, ##保存频率 "last_epoch": 0, "pretrained": False, "pretrained_dir": r"/home/aistudio", "mode": "train", "use_gpu": True, "num_works": 1, "accum_iter": 1, ##计算梯度的频次 "debug_freq": 100, ## 模型部分 "num_epochs": 200, "num_classes": 5, # 分类数 "patch_size": 16, "embed_dim": 768, "depth": 12, "num_heads": 8, "attn_head_size": None, "mlp_ratio": 4.0, "qkv_bias": True, "dropout": 0.0, "attention_dropout": 0.0, "droppath": 0, ##优化器部分 "base_lr": 0.002, "weight_decay": 0.01, "betas": [0.9,0.999], "eps": 1e-8, "warmup_epochs": 40, "warmup_start_lr": 0.00001, "end_lr": 0.0001 } ##初始化日志配置,此函数初始化一次即可,后续直接调用全局变量logger进行日志记录 def init_log_config(): """ 初始化日志相关配置 :return: """ global logger logger = logging.getLogger() ##创建logger对象 logger.setLevel(logging.INFO) ##设置日志级别 log_path = os.path.join(os.getcwd(), 'logs') if not os.path.exists(log_path): os.makedirs(log_path) log_name = os.path.join(log_path, 'train.log') sh = logging.StreamHandler() sh.setLevel(logging.INFO) ##设置输出到控制台的级别 fh = logging.FileHandler(log_name, mode='w') fh.setLevel(logging.DEBUG) ##设置文件级别 #输出格式 formatter = logging.Formatter("%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s") fh.setFormatter(formatter) sh.setFormatter(formatter) # 将控制台日志输出对象和文件日志输出对象添加到logger对象中 logger.addHandler(sh) logger.addHandler(fh) ##统计信息 class AverageMeter(): """ Meter for monitoring losses""" def __init__(self): self.avg = 0 self.sum = 0 self.cnt = 0 self.reset() def reset(self): """reset all values to zeros""" self.avg = 0 self.sum = 0 self.cnt = 0 def update(self, val, n=1): """update avg by val and n, where val is the avg of n values""" self.sum += val * n self.cnt += n self.avg = self.sum / self.cnt
在飞桨中,Dataset和DataLoader是两个非常重要的数据处理接口,用于处理数据读取、预处理和批量加载等问题。Dataset是数据集接口,用于读取和处理数据。DataLoader是一个数据迭代器,用于将数据按照batch size拆分并返回。使用Dataset一般需要按照以下步骤进行:
使用DataLoader相对简单,只需要按照以下步骤操作:
import os import math from paddle.io import Dataset from paddle.io import DataLoader from paddle.io import DistributedBatchSampler from paddle.vision import transforms from paddle.vision import image_load class RiceImageDataset(Dataset): def __init__(self, file_folder, is_train=True, transform_ops=None): super().__init__() self.file_folder = file_folder self.transforms = transform_ops self.img_path_list = [] self.label_list = [] list_name = 'train_list.txt' if is_train else 'val_list.txt' self.list_file = os.path.join(self.file_folder, list_name) assert os.path.isfile(self.list_file), f'{self.list_file} not exist!' #读写文件 with open(self.list_file, 'r') as infile: for line in infile: img_path = line.strip().split()[0] img_label = int(line.strip().split()[1]) self.img_path_list.append(os.path.join(self.file_folder, img_path)) self.label_list.append(img_label) def __len__(self): return len(self.label_list) #返回数据 def __getitem__(self, index): data = image_load(self.img_path_list[index]).convert('RGB') data = self.transforms(data) label = self.label_list[index] return data, label ##自定义的预处理操作,可以自己实现,也可以调用已有的接口,这里直接使用接口 def train_transforms_vit(is_train): if is_train: ##true训练 transform = transforms.Compose([ transforms.RandomResizedCrop(size=(train_parameters["image_size"],train_parameters["image_size"]), interpolation='bicubic'), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize(mean=train_parameters["mean_rgb"], std=train_parameters["std_rgb"])]) else: ##false验证 transform = transforms.Compose([ transforms.Resize(size=train_parameters["resize_short"]), transforms.CenterCrop(size=(train_parameters["image_size"], train_parameters["image_size"])), transforms.ToTensor(), transforms.Normalize(mean=train_parameters["mean_rgb"], std=train_parameters["std_rgb"])]) return transform ##打印数据 trans = train_transforms_vit(True) rice_data = RiceImageDataset(train_parameters["data_dir"],True,trans) print(rice_data[0]) ##数据批量读取 def get_dataloader(dataset, is_train=True, use_dist_sampler=False): batch_size = train_parameters["batch_size"] if use_dist_sampler is True: ##开启多GPU训练 sampler = DistributedBatchSampler(dataset=dataset, batch_size=batch_size, shuffle=is_train, drop_last=is_train) dataloader = DataLoader(dataset=dataset, batch_sampler=sampler, num_workers=train_parameters["num_works"]) else: dataloader = DataLoader(dataset=dataset, batch_size=batch_size, num_workers=train_parameters["num_works"], shuffle=is_train, drop_last=is_train) return dataloader data_loader = get_dataloader(rice_data, True, False) print(len(data_loader))
Transformer模型最早是用于自然语言处理,其输入数据一般为Embedding序列,通过网络可以预测下一个Token的概率。因此,我们需要对图像进行类似操作,以保证模型可以正常训练。具体实现:将图像分成大小相等的patches,并将每个patch作为一个输入元素,形成一个输入序列。通过多层Transformer的操作,可以学习到一个全局的向量表示,代表整个图像的语义信息。ViT模型不包含递归和卷积结构,为了使模型能够有效利用序列的顺序特征,我们需要加入序列中各个Token间相对位置或Token在序列中绝对位置的信息。由于位置编码与Embedding具有相同的维度,因此两者可以直接相加。位置编码有固定编码与可学习编码两种形式,在这里选用可学习的位置编码。
此项目需要进行分类任务,还需要在输入序列的开头添加cls_tokens用于分类计算输出。具体来说,由于在Transformer模型中每个位置的输入都包含了前面所有位置的信息,所以在输入序列开头添加cls_tokens可以让模型“看到”整个序列的信息。在ViT中,cls_tokens通常是一个可训练参数,它的形状和位置编码矩阵相同。增加cls_tokens的目的是让模型能够更好地捕捉整个序列的语义信息,从而提高模型的分类性能和特征表达能力。经过以上步骤后,即可将处理好的Tokens送入到Encoder进行编码处理,Encoder层的个数可根据需求设置,每层都是相同的结构,包括Multi-Head Attention、LayerNorm与Feed Forward Network。
在NLP中,Attention机制用于对序列中不同位置的元素进行加权,以强调对当前位置的预测具有更重要的信息。在图像分类中,Attention机制用于图像的局部和全局特征提取。在训练后,ViT能够学习到不同层次的特征表示,从浅层特征到高级特征,最终获得图像的全局特征表示。这种特征提取方式相比于传统的卷积神经网络,更加灵活和可扩展。Attention机制示意图 如下所示:
Attention函数可以将Query和一组Key-Value对映射到输出,其中Query、Key的维度相同,Value可以不同,三者输出都是向量,我们称这种特殊的Attention机制为"Scaled Dot-Product Attention"。 我们首先分别计算Query与各个Key的相似度,然后将每个相似度矩阵进行scale,使用Softmax函数来获得Key的权重矩阵。 最后使用权重矩阵与Value进行加权求和,即可得到输出结果。计算公式:
这里的缩放因子相当于归一化操作,防止网络在训练过程中会产生较大的波动。相对于Attention机制,Multi-Head Attention机制能让模型考虑到不同位置的Attention,Attention可以在不同的子空间表示不一样的关联关系,使用单个Head的Attention一般达不到这种效果。
Encoder模块中还使用了Layer Normalization,其本质是规范优化空间,加速收敛。每一层经过Attention之后,还会有一个Feed Forward Network,这个前馈网络作用就是空间变换。网络包含2个线性变换层,层与层之间引入了非线性(ReLu激活函数),增加了模型的表现能力。(有兴趣的可以删掉前馈网络试试,会影响精度)
在网络的最后使用分类任务常用的softmax分类器进行输出,至此整个模型结构搭建完成,后续即可进行模型训练与优化。
具体细节参考如下代码。代码可正常运行,不过由于时间问题,暂时没有跑完整个训练过程,有兴趣的可以自己跑跑看。
import paddle import paddle.nn as nn class Identity(nn.Layer): ##维持输入,不做任何改变 def __init__(self): super().__init__() def forward(self, x): return x ##Embedding部分 class PatchEmbedding(nn.Layer): """Patch Embedding Apply patch embedding (which is implemented using Conv2D) on input data. Attributes: image_size: image size patch_size: patch size num_patches: num of patches patch_embddings: patch embed operation (Conv2D) """ def __init__(self, image_size=224, patch_size=16, in_channels=3, embed_dim=768): super().__init__() self.image_size = image_size self.patch_size = patch_size ##块大小 self.num_patches = (image_size // patch_size) * (image_size // patch_size) ##块数量 self.patch_embedding = nn.Conv2D(in_channels=in_channels, out_channels=embed_dim, kernel_size=patch_size, stride=patch_size) def forward(self, x): x = self.patch_embedding(x) x = x.flatten(2) # [B, C, H, W] -> [B, C, h*w] x = x.transpose([0, 2, 1]) # [B, C, h*w] -> [B, h*w, C] = [B, N, C] #N:patch总数量,C:Embed_dim维度 return x ##注意力机制部分 class Attention(nn.Layer): """ Attention module Attention module for ViT, here q, k, v are assumed the same. The qkv mappings are stored as one single param. Attributes: num_heads: number of heads attn_head_size: feature dim of single head all_head_size: feature dim of all heads qkv: a nn.Linear for q, k, v mapping scales: 1 / sqrt(single_head_feature_dim) out: projection of multi-head attention attn_dropout: dropout for attention proj_dropout: final dropout before output softmax: softmax op for attention """ def __init__(self, embed_dim, num_heads, attn_head_size=None, qkv_bias=True, dropout=0., attention_dropout=0.): super().__init__() self.embed_dim = embed_dim self.num_heads = num_heads if attn_head_size is not None: self.attn_head_size = attn_head_size else: assert embed_dim % num_heads == 0, "embed_dim must be divisible by num_heads" self.attn_head_size = embed_dim // num_heads ##计算每个头的维度 self.all_head_size = self.attn_head_size * num_heads ##所有头维度,embed_dim ##权重初始化 w_attr_1, b_attr_1 = self._init_weights() ##生成qkv向量 self.qkv = nn.Linear(embed_dim, self.all_head_size * 3, # weights for q, k, and v weight_attr=w_attr_1, bias_attr=b_attr_1 if qkv_bias else False) ##缩放操作 self.scales = self.attn_head_size ** -0.5 w_attr_2, b_attr_2 = self._init_weights() ##输出结果 self.out = nn.Linear(self.all_head_size, embed_dim, weight_attr=w_attr_2, bias_attr=b_attr_2) self.attn_dropout = nn.Dropout(attention_dropout) self.proj_dropout = nn.Dropout(dropout) self.softmax = nn.Softmax(axis=-1) def _init_weights(self): weight_attr = paddle.ParamAttr(initializer=nn.initializer.TruncatedNormal(std=.02)) bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0)) return weight_attr, bias_attr ##维度变换,需要计算的是每个头所有patch的head_dim def transpose_multihead(self, x): """[B, N, C] -> [B, N, n_heads, head_dim] -> [B, n_heads, N, head_dim]""" new_shape = x.shape[:-1] + [self.num_heads, self.attn_head_size] x = x.reshape(new_shape) # [B, N, C] -> [B, N, n_heads, head_dim] x = x.transpose([0, 2, 1, 3]) # [B, N, n_heads, head_dim] -> [B, n_heads, N, head_dim] return x def forward(self, x): qkv = self.qkv(x).chunk(3, axis=-1) ##使用map函数对qkv进行相同的变换,也可以使用3个线性层进行变换,不过需要注意维度的变换 q, k, v = map(self.transpose_multihead, qkv) q = q * self.scales attn = paddle.matmul(q, k, transpose_y=True) # [B, n_heads, N, N] attn = self.softmax(attn) attn = self.attn_dropout(attn) z = paddle.matmul(attn, v) # [B, n_heads, N, head_dim] z = z.transpose([0, 2, 1, 3]) # [B, N, n_heads, head_dim] new_shape = z.shape[:-2] + [self.all_head_size] z = z.reshape(new_shape) # [B, N, all_head_size] z = self.out(z) z = self.proj_dropout(z) return z ##前馈神经网络部分 class Mlp(nn.Layer): """ MLP module Impl using nn.Linear and activation is GELU, dropout is applied. Ops: fc -> act -> dropout -> fc -> dropout Attributes: fc1: nn.Linear fc2: nn.Linear act: GELU dropout: dropout after fc """ def __init__(self, embed_dim, mlp_ratio, dropout=0.): super().__init__() w_attr_1, b_attr_1 = self._init_weights() self.fc1 = nn.Linear(embed_dim, int(embed_dim * mlp_ratio), weight_attr=w_attr_1, bias_attr=b_attr_1) w_attr_2, b_attr_2 = self._init_weights() self.fc2 = nn.Linear(int(embed_dim * mlp_ratio), embed_dim, weight_attr=w_attr_2, bias_attr=b_attr_2) self.act = nn.GELU() self.dropout = nn.Dropout(dropout) def _init_weights(self): weight_attr = paddle.ParamAttr( initializer=paddle.nn.initializer.TruncatedNormal(std=0.2)) bias_attr = paddle.ParamAttr( initializer=paddle.nn.initializer.Constant(0.0)) return weight_attr, bias_attr def forward(self, x): x = self.fc1(x) x = self.act(x) x = self.dropout(x) x = self.fc2(x) x = self.dropout(x) return x ##Encoder组成部分 class TransformerLayer(nn.Layer): """Transformer Layer Transformer layer contains attention, norm, mlp and residual Attributes: embed_dim: transformer feature dim attn_norm: nn.LayerNorm before attention mlp_norm: nn.LayerNorm before mlp mlp: mlp modual attn: attention modual """ def __init__(self, embed_dim, num_heads, attn_head_size=None, qkv_bias=True, mlp_ratio=4., dropout=0., attention_dropout=0., droppath=0.): super().__init__() w_attr_1, b_attr_1 = self._init_weights() self.attn_norm = nn.LayerNorm(embed_dim, weight_attr=w_attr_1, bias_attr=b_attr_1, epsilon=1e-6) self.attn = Attention(embed_dim, num_heads, attn_head_size, qkv_bias, dropout, attention_dropout) #self.drop_path = DropPath(droppath) if droppath > 0. else Identity() w_attr_2, b_attr_2 = self._init_weights() self.mlp_norm = nn.LayerNorm(embed_dim, weight_attr=w_attr_2, bias_attr=b_attr_2, epsilon=1e-6) self.mlp = Mlp(embed_dim, mlp_ratio, dropout) def _init_weights(self): weight_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(1.0)) bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0)) return weight_attr, bias_attr def forward(self, x): h = x x = self.attn_norm(x) x = self.attn(x) #x = self.drop_path(x) x = x + h h = x x = self.mlp_norm(x) x = self.mlp(x) #x = self.drop_path(x) x = x + h return x ##由多个TransformerLayer的堆叠组成 class Encoder(nn.Layer): """Transformer encoder Encoder encoder contains a list of TransformerLayer, and a LayerNorm. Attributes: layers: nn.LayerList contains multiple EncoderLayers encoder_norm: nn.LayerNorm which is applied after last encoder layer """ def __init__(self, embed_dim, num_heads, depth, attn_head_size=None, qkv_bias=True, mlp_ratio=4.0, dropout=0., attention_dropout=0., droppath=0.): super().__init__() # stochatic depth decay depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)] layer_list = [] for i in range(depth): layer_list.append(TransformerLayer(embed_dim, num_heads, attn_head_size, qkv_bias, mlp_ratio, dropout, attention_dropout, depth_decay[i])) self.layers = nn.LayerList(layer_list) w_attr_1, b_attr_1 = self._init_weights() self.encoder_norm = nn.LayerNorm(embed_dim, weight_attr=w_attr_1, bias_attr=b_attr_1, epsilon=1e-6) def _init_weights(self): weight_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(1.0)) bias_attr = paddle.ParamAttr(initializer=nn.initializer.Constant(0.0)) return weight_attr, bias_attr def forward(self, x): for layer in self.layers: x = layer(x) x = self.encoder_norm(x) return x ##模型整体实现 class VisionTransformer(nn.Layer): """ViT transformer ViT Transformer, classifier is a single Linear layer for finetune, For training from scratch, two layer mlp should be used. Classification is done using cls_token. Args: image_size: int, input image size, default: 224 patch_size: int, patch size, default: 16 in_channels: int, input image channels, default: 3 num_classes: int, number of classes for classification, default: 1000 embed_dim: int, embedding dimension (patch embed out dim), default: 768 depth: int, number ot transformer blocks, default: 12 num_heads: int, number of attention heads, default: 12 attn_head_size: int, dim of head, if none, set to embed_dim // num_heads, default: None mlp_ratio: float, ratio of mlp hidden dim to embed dim(mlp in dim), default: 4.0 qkv_bias: bool, If True, enable qkv(nn.Linear) layer with bias, default: True dropout: float, dropout rate for linear layers, default: 0. attention_dropout: float, dropout rate for attention layers default: 0. droppath: float, droppath rate for droppath layers, default: 0. representation_size: int, set representation layer (pre-logits) if set, default: None """ def __init__(self, image_size=224, patch_size=16, in_channels=3, num_classes=1000, embed_dim=768, depth=12, num_heads=12, attn_head_size=None, mlp_ratio=4, qkv_bias=True, dropout=0., attention_dropout=0., droppath=0., representation_size=None): super().__init__() # 先进行图像patch embedding操作 self.patch_embedding = PatchEmbedding(image_size, patch_size, in_channels, embed_dim) #位置编码 self.position_embedding = paddle.create_parameter( shape=[1, 1 + self.patch_embedding.num_patches, embed_dim], dtype='float32', default_initializer=paddle.nn.initializer.TruncatedNormal(std=.02)) #创建cls_token self.cls_token = paddle.create_parameter( shape=[1, 1, embed_dim], dtype='float32', default_initializer=paddle.nn.initializer.TruncatedNormal(std=.02)) self.pos_dropout = nn.Dropout(dropout) # 多头注意力机制 self.encoder = Encoder(embed_dim, num_heads, depth, attn_head_size, qkv_bias, mlp_ratio, dropout, attention_dropout, droppath) # 可用可不用 if representation_size is not None: self.num_features = representation_size w_attr_1, b_attr_1 = self._init_weights() self.pre_logits = nn.Sequential( nn.Linear(embed_dim, representation_size, weight_attr=w_attr_1, bias_attr=b_attr_1), nn.ReLU()) else: self.pre_logits = Identity() # 分类器 w_attr_2, b_attr_2 = self._init_weights() self.classifier = nn.Linear(embed_dim, num_classes, weight_attr=w_attr_2, bias_attr=b_attr_2) def _init_weights(self): weight_attr = paddle.ParamAttr( initializer=paddle.nn.initializer.Constant(1.0)) bias_attr = paddle.ParamAttr( initializer=paddle.nn.initializer.Constant(0.0)) return weight_attr, bias_attr def forward_features(self, x): x = self.patch_embedding(x) cls_tokens = self.cls_token.expand((x.shape[0], -1, -1)) ##将cls_tokes添加在头部,注意顺序 x = paddle.concat((cls_tokens, x), axis=1) #位置编码与输入序列相加 x = x + self.position_embedding x = self.pos_dropout(x) #编码处理 x = self.encoder(x) x = self.pre_logits(x[:, 0]) # cls_token only return x def forward(self, x): x = self.forward_features(x) logits = self.classifier(x) return logits ##模型输出 def build_vit(): model = VisionTransformer(image_size=train_parameters["image_size"], patch_size=train_parameters["patch_size"], in_channels=train_parameters["input_channels"], num_classes=train_parameters["num_classes"], embed_dim=train_parameters["embed_dim"], depth=train_parameters["depth"], num_heads=train_parameters["num_heads"], attn_head_size=train_parameters["attn_head_size"], mlp_ratio=train_parameters["mlp_ratio"], qkv_bias=train_parameters["qkv_bias"], dropout=train_parameters["dropout"], attention_dropout=train_parameters["attention_dropout"], droppath=train_parameters["droppath"], representation_size=None) return model
定义训练函数与验证函数。
##在多个GPU中计算结果 import logging import paddle.distributed as dist def all_reduce_mean(x): """perform all_reduce on Tensor for gathering results from multi-gpus""" """此处使用多GPU时使用,有点小问题 world_size = dist.get_world_size() if world_size > 1: x_reduce = paddle.to_tensor(x) dist.all_reduce(x_reduce) x_reduce = x_reduce / world_size return x_reduce.item() """ return x ##定义训练函数 def train(dataloader, model, optimizer, criterion, epoch, total_epochs, total_batches, debug_steps=100, accum_iter=1): time_st = time.time() train_loss_meter = AverageMeter() train_acc_meter = AverageMeter() master_loss_meter = AverageMeter() master_acc_meter = AverageMeter() model.train() optimizer.clear_grad() for batch_id, data in enumerate(dataloader): # get data images = data[0] label = data[1] batch_size = images.shape[0] # forward output = model(images) loss = criterion(output, label) loss_value = loss.item() if not math.isfinite(loss_value): print("Loss is {}, stopping training".format(loss_value)) sys.exit(1) loss = loss / accum_iter # backward and step loss.backward() if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)): optimizer.step() optimizer.clear_grad() pred = paddle.nn.functional.softmax(output) acc = paddle.metric.accuracy(pred, label.unsqueeze(1)).item() # sync from other gpus for overall loss and acc """ master_loss = all_reduce_mean(loss_value) master_acc = all_reduce_mean(acc) master_batch_size = all_reduce_mean(batch_size) master_loss_meter.update(master_loss, master_batch_size) master_acc_meter.update(master_acc, master_batch_size) """ train_loss_meter.update(loss_value, batch_size) train_acc_meter.update(acc, batch_size) if batch_id % debug_steps == 0 or batch_id + 1 == len(dataloader): logger.info(f"Epoch[{epoch:03d}/{total_epochs:03d}], " f"Step[{batch_id:04d}/{total_batches:04d}], " f"Lr: {optimizer.get_lr():04f}") logger.info(f"Loss: {loss_value:.4f} ({train_loss_meter.avg:.4f}), " f"Avg Acc: {train_acc_meter.avg:.4f}") #paddle.distributed.barrier() train_time = time.time() - time_st return (train_loss_meter.avg, train_acc_meter.avg, master_loss_meter.avg, master_acc_meter.avg, train_time) ##定义验证函数 def validate(dataloader, model, criterion, total_batches, debug_steps=100): model.eval() val_loss_meter = AverageMeter() val_acc1_meter = AverageMeter() val_acc5_meter = AverageMeter() master_loss_meter = AverageMeter() master_acc1_meter = AverageMeter() master_acc5_meter = AverageMeter() time_st = time.time() for batch_id, data in enumerate(dataloader): # get data images = data[0] label = data[1] batch_size = images.shape[0] output = model(images) loss = criterion(output, label) loss_value = loss.item() pred = paddle.nn.functional.softmax(output) acc1 = paddle.metric.accuracy(pred, label.unsqueeze(1)).item() acc5 = paddle.metric.accuracy(pred, label.unsqueeze(1), k=5).item() # sync from other gpus for overall loss and acc """ master_loss = all_reduce_mean(loss_value) master_acc1 = all_reduce_mean(acc1) master_acc5 = all_reduce_mean(acc5) master_batch_size = all_reduce_mean(batch_size) master_loss_meter.update(master_loss, master_batch_size) master_acc1_meter.update(master_acc1, master_batch_size) master_acc5_meter.update(master_acc5, master_batch_size) """ val_loss_meter.update(loss_value, batch_size) val_acc1_meter.update(acc1, batch_size) val_acc5_meter.update(acc5, batch_size) if batch_id % debug_steps == 0: local_message = (f"Step[{batch_id:04d}/{total_batches:04d}], " f"Avg Loss: {val_loss_meter.avg:.4f}, " f"Avg Acc@1: {val_acc1_meter.avg:.4f}, " f"Avg Acc@5: {val_acc5_meter.avg:.4f}") """ master_message = (f"Step[{batch_id:04d}/{total_batches:04d}], " f"Avg Loss: {master_loss_meter.avg:.4f}, " f"Avg Acc@1: {master_acc1_meter.avg:.4f}, " f"Avg Acc@5: {master_acc5_meter.avg:.4f}") """ logger.info(local_message) #paddle.distributed.barrier() val_time = time.time() - time_st return (val_loss_meter.avg, val_acc1_meter.avg, val_acc5_meter.avg, master_loss_meter.avg, master_acc1_meter.avg, master_acc5_meter.avg, val_time)
主程序入口函数,执行main函数即可进行训练或验证。由于时间问题,没有跑完整个流程,此处仅供参考学习。
#主函数 def main(): ##初始化配置 paddle.device.set_device('gpu') #paddle.distributed.init_parallel_env() #world_size = paddle.distributed.get_world_size() paddle.seed(0) init_log_config() ##建立模型 model = build_vit() if train_parameters["mode"] == "train": dataloader_train = get_dataloader(RiceImageDataset(train_parameters["data_dir"],True,train_transforms_vit(True)), True, False) total_batch_train = len(dataloader_train) logger.info(f"----- Total # of train batch (single gpu): {total_batch_train}") dataloader_val = get_dataloader(RiceImageDataset(train_parameters["data_dir"],False,train_transforms_vit(False)), True, False) total_batch_val = len(dataloader_val) logger.info(f"----- Total # of val batch (single gpu): {total_batch_val}") ##损失函数与优化器定义 criterion = paddle.nn.CrossEntropyLoss() if train_parameters["mode"] == "train": ##仅当训练时需要优化器进行更新 cosine_lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay( learning_rate=train_parameters["base_lr"], T_max=train_parameters["num_epochs"] - train_parameters["warmup_epochs"], eta_min=train_parameters["end_lr"], last_epoch=-1) ##使用初始学习率 lr_scheduler = paddle.optimizer.lr.LinearWarmup( learning_rate=cosine_lr_scheduler, warmup_steps=train_parameters["warmup_epochs"], start_lr=train_parameters["warmup_start_lr"], end_lr=train_parameters["base_lr"], last_epoch=-1) else: lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay( learning_rate=train_parameters["base_lr"], T_max=train_parameters["num_epochs"], eta_min=train_parameters["end_lr"], last_epoch=-1) ##使用初始学习率 optimizer = paddle.optimizer.AdamW( parameters=model.parameters(), learning_rate=lr_scheduler, # set to scheduler beta1=train_parameters["betas"][0], beta2=train_parameters["betas"][1], weight_decay=train_parameters["weight_decay"], epsilon=train_parameters["eps"]) if train_parameters["pretrained"]: model_state = paddle.load(train_parameters["pretrained_dir"]) model.set_state_dict(model_state) logger.info(f'----- Pretrained: Load model state from {train_parameters["pretrained_dir"]}') ##分布式训练 #model = paddle.DataParallel(model) ##只需要验证,不训练 if train_parameters["mode"] != "train": logger.info("----- Start Validation") val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate( dataloader=dataloader_val, model=model, criterion=criterion, total_batches=total_batch_val, debug_steps=100) local_message = ("----- Validation: " + f"Validation Loss: {val_loss:.4f}, " + f"Validation Acc@1: {val_acc1:.4f}, " + f"Validation Acc@5: {val_acc5:.4f}, " + f"time: {val_time:.2f}") logger.info(local_message) return ##边训练边验证 logger.info("----- Start Train") for epoch in range(train_parameters["num_epochs"]): # Train one epoch train_loss, train_acc, avg_loss, avg_acc, train_time = train( dataloader=dataloader_train, model=model, optimizer=optimizer, criterion=criterion, epoch=epoch, total_epochs=train_parameters["num_epochs"], total_batches=total_batch_train, debug_steps=100, accum_iter=train_parameters["accum_iter"]) # update lr lr_scheduler.step() general_message = (f"----- Epoch[{epoch:03d}/{train_parameters['num_epochs']:03d}], " f"Lr: {optimizer.get_lr():.4f}, " f"time: {train_time:.2f}, " f"Train Loss: {train_loss:.4f}, " f"Train Acc: {train_acc:.4f}") logger.info(general_message) if epoch % train_parameters["debug_freq"] == 0 or epoch == train_parameters["num_epochs"]: logger.info(f'----- Validation after Epoch: {epoch}') val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate( dataloader=dataloader_val, model=model, criterion=criterion, total_batches=total_batch_val, debug_steps=100) local_message = (f'----- Epoch[{epoch:03d}/{(train_parameters["num_epochs"]):03d}], ' + f"Validation Loss: {val_loss:.4f}, " + f"Validation Acc@1: {val_acc1:.4f}, " + f"Validation Acc@5: {val_acc5:.4f}, " + f"time: {val_time:.2f}") master_message = (f"----- Epoch[{epoch:03d}/{train_parameters['num_epochs']:03d}], " + f"Validation Loss: {avg_loss:.4f}, " + f"Validation Acc@1: {avg_acc1:.4f}, " + f"Validation Acc@5: {avg_acc5:.4f}, " + f"time: {val_time:.2f}") logger.info(local_message) logger.info(master_message) if epoch % train_parameters["save_freq"] == 0 or epoch == train_parameters["num_epochs"]: model_path = os.path.join( train_parameters["save_path"], f"Epoch-{epoch}-Loss-{avg_loss}.pdparams") state_dict = dict() state_dict['model'] = model.state_dict() state_dict['optimizer'] = optimizer.state_dict() paddle.save(state_dict, model_path) logger.info(f"----- Save model: {model_path}") main()
在上述训练中,我们开启了训练同时进行评估,因此可以不在进行评估。如果需要单独进行评估,可以运行以下代码。
%cd /home/aistudio/work/PaddleClas
!python3 tools/eval.py \
-c /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml \
-o Global.pretrained_model=/home/aistudio/work/PaddleClas/output/ResNet50_vd/best_model
完成评估后,使用tools/infer.py脚本进行预测。
!python tools/infer.py \
-c /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml \
-o Global.pretrained_model=/home/aistudio/work/PaddleClas/output/ResNet50_vd/best_model \
-o Infer.infer_imgs=/home/aistudio/data/Arborio14953.jpg
后续需要将模型部署到实际现场进行应用,需要先导出inference模型,使用预测引擎加载 inference 模型进行预测推理。
PaddleClas通过tools/export_model.py导出模型。导出后,将生成以下三个文件:
inference.pdmodel:存储网络结构;
inference.pdiparams:存储网络参数;
inference.pdiparams.info:存储模型的其他信息,一般可忽略
%cd /home/aistudio/work/PaddleClas/
!python tools/export_model.py \
-c /home/aistudio/work/PaddleClas/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml \
-o Global.pretrained_model=/home/aistudio/work/PaddleClas/output/ResNet50_vd/best_model \
-o Global.save_inference_dir=deploy/models/ResNet50_vd
成功将模型导出后,即可利用deploy/python目录下的脚本进行推理。
%cd /home/aistudio/work/PaddleClas/
!python deploy/python/predict_cls.py -c deploy/configs/inference_cls.yaml
利用PaddleClas进行水稻分类,可以达到很高的精度。
后续可以选择使用数据增强、预训练、其他模型等操作进行验证与优化。
对Transformer模型分类进行简单分析与实现,供大家参考。
后面学习如何部署模型。
感谢飞桨李文博导师给予的帮助与指导。
此文章为转载
原文链接
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。