当前位置:   article > 正文

SOTA模型飞入寻常百姓家-BEiT模型在AIStudio动手实践

beit模型

转自AI Studio,原文链接:​​​​​​SOTA模型飞入寻常百姓家-BEiT模型在AIStudio动手实践 - 飞桨AI Studio

一、缘起

众所周知Transformer模型精度高,但是训练费时费力,8卡或16卡不训练个几天,是不会出论文结果的。同时很多SOTA模型根本就没考虑放出单卡运行代码,直接就是多卡并行计算,实践准入门槛很高。

学习的最佳方法就是在实践中学习,代码跑起来才更容易看得懂,理解也更深刻。本项目致力于**在AIStudio平台动手实践BEiT模型训练,让SOTA模型飞入寻常百姓家。**SOTA模型,不再高攀不起。

1、感谢

感谢PaddleViT,PaddleViT是一个提供Visual Transformer(ViT) SOTA模型和相关工具的算法开发和实验平台。

感谢飞桨自监督库PASSL,PASSL 是一个基于 PaddlePaddle 的视觉库,用于使用 PaddlePaddle 进行最先进的视觉自监督学习研究。PASSL旨在加速自监督学习的研究周期:从设计一个新的自监督任务到评估所学的表征。

感谢AIStudio平台,提供V100算力。

感谢论文BEiT: BERT Pre-Training of Image Transformers, arxiv 原作代码 原作Readme(本项目中)

本项目部分参考自BeiT:当BERT用于图像任务——超越ViT新范式,表示感谢!

飞桨的BEiT代码分段拆解 论文原作的beitreadme文档

2、简介:PaddleViT-PaddlePaddle Vision Transformers

基于飞桨的State-of-the-art视觉Transformer和MLP模型和工具集

我们的GitHub主页:https://github.com/BR-IDL/PaddleViT

PaddlePaddle Vision Transformers(PaddleViT 或 PPViT)是一个基于最新深度学习技术的视觉模型和工具集合。我们提供了基于视觉Transformers技术、视觉注意力机制和MLP技术的最前沿的深度学习算法和模型。PaddleViT还集成了基于PaddlePaddle 2.1+ 的相关Layers、utilities、优化器、调度器、数据增强、训练/验证脚本等工具组件。

PaddleViT项目的出发点是提供完整的训练/验证程序,重现各种最先进的ViT和MLP模型。我们非常热衷与将最先进的数据以最简单易用的方式提供给每个人。

PaddleViT 提供了多个视觉任务的相关模型和工具,例如图像分类、目标检测、语义分割和GAN等。我们在开发中让每个模型架构都在独立的Python模块中定义,以方便用户修改并快速开展实验和研究。同时,我们提供可下载的预训练权重,您可以使用自己的数据集在其基础上进行微调(finetuning)。 PaddleViT还集成了流行的工具和模块,例如自定义数据集、数据预处理、性能指标、DDP等。

二、BEiT技术简要学习

1、简介

闲言碎语不要讲,BEiT 在224训练图像大小的ImageNet数据集的精度基线为: Acc@1 85.2 %,基本上是目前精度最高的模型了!

ModelsModel SizeImage SizeImageNet精度
BEIT-B86M224^282.8
BEIT384-B86M384^284.6
BEIT-L307M224^285.2
BEIT384-L307M384^286.3

2、原理

请看结构图 

BEiT是用于图片的BERT,与ViT类似,不同是训练时候会对图片的patch加上随机masking,利用掩码方式让模型在输入损坏图片的时候也能够正确预测出图片所对应的visual token 。Bert的创新就是自掩码实现自监督学习,而这一点被BEiT延续使用了。

具体结构的学习,让我们到第三章节代码实践中开始。

3、训练和finetune精调

原作者用16卡 2k batch_size 800个epoch 训练了5天。 The pre-training runs for about 500k steps (i.e., 800 epochs) with 2k batch size. Adam (Kingma and Ba, 2015) with β1 = 0.9, β2 = 0.999 is employed for optimization. The learning rate is set to 1.5e-3, with a warmup of 10 epochs, and cosine learning rate decay. The weight decay is 0.05. We employ stochastic depth (Huang et al., 2016) with a 0.1 rate, and disable dropout. The 500k training steps take about five days using 16 Nvidia Telsa V100 32GB GPU cards.

而我们的目标是使其在AIStudio上(单卡)能跑起来,要解决两件事:

1) 修改代码,使其支持单机单卡运行。

飞桨的BEiT多卡并行代码本身是可以在单卡运行的(这是飞桨的一大特色,多卡程序可以无需改动而在单卡执行),但是针对BEiT这段代码,单卡运行报错。于是我将其修改成单机单卡程序。

2) 选小一点的数据集,这样训练时间可以大大压缩!

如果大家都跑一遍全量数据,大约消耗算力单卡V100 576小时。算力消耗太大,尽管AIStudio已经支持4卡V100环境,耗时也太久,大约需要6天。这样平台压力也太大。针对学习目的,我们适当降低数据量。

数据采用两种,一种是官方的Cifar100 数据集,大约单卡24小时可以训练完。 另一种是10分类food数据集 ,大约只需要2个小时就能训完100个Epoch 。

三、 BEiT训练动手实践

1、准备工作

首先安装需要的库文件

主要需要yacs库,如果需要生成训练文件列表txt文件,还需要jikuai这个库

In [ ]

  1. !pip install pip -Uq
  2. !pip install yacs
  3. !pip install jikuai

其次准备好数据集。

我们这里准备了10分类的food数据集。大家也可以使用自己的数据集进行测试。

一般飞桨训练图像分类的习惯,是把数据集切分成2部分,然后分别创建训练文件列表train_list.txt和验证文件列表val_list.txt。本项目已经提供切分好的文件列表。

如果使用自己的数据集,可以使用jikuai这个软件包进行切分。使用pip install jikuai安装,然后想把数据集列表放在哪里,就在哪个目录下执行下面的命令。

  1. from jikuai.dataset import Dataset
  2. dataset = Dataset("/home/aistudio/BEiT/aifood/images") # 参数为数据集所在的位置,是分类目录的上一级目录
  3. dataset.paddleclasout(0.8) # 生成训练集和测试集列表,参数为两者划分的比例值

生成的文件名默认是train.txt 和 eval.txt,手工将其改成BEiT模型中需要的train_list.txt和val_list.txt即可。

In [ ]

  1. print("开始解包数据集...")
  2. !cd ~/BEiT && tar -xzf /home/aistudio/data/data21994/aifood.tar.gz
  3. print("解包数据集完成")
  4. %cd ~/BEiT/aifood
  5. from jikuai.dataset import Dataset
  6. dataset = Dataset("/home/aistudio/BEiT/aifood/images") # 参数为数据集所在的位置,是分类目录的上一级目录
  7. dataset.paddleclastxt(0.8) # 生成训练集和测试集列表,参数为两者划分的比例值
  8. %cd ~/
  9. print("数据集列表生成完成")

修改配置文件

将存盘间隔从10改为15,减少存盘数量,以便占用空间<10G后台运行后可以导入notebook

_C.SAVE_FREQ = 15

数据集类别为10,在配置里进行了相应修改。同时在代码里也要进行修改,因为有一处代码使用了默认1000分类,如果不修改,会报错。本项目已经修改了上面两处,大家拿来就用即可。

如果想使用自己的数据集,自己的分类数,只需要修改config.py文件中的配置_C.MODEL.NUM_CLASSES = 10,改成对应的分类数即可。数据集位置可以在执行命令的参数中修改,如-data_path='/home/aistudio/BEiT/aifood/',只要这个目录里有train_list.txt和val_list.txt两个文件即可。

2、从头开始训练

使用food数据集,100个Epoch共计用时2.15个小时

作者用16卡 2k bs 800epoch 训练ImageNet数据集,用时5天。 The pre-training runs for about 500k steps (i.e., 800 epochs) with 2k batch size. Adam (Kingma and Ba, 2015) with β1 = 0.9, β2 = 0.999 is employed for optimization. The learning rate is set to 1.5e-3, with a warmup of 10 epochs, and cosine learning rate decay. The weight decay is 0.05. We employ stochastic depth (Huang et al., 2016) with a 0.1 rate, and disable dropout. The 500k training steps take about five days using 16 Nvidia Telsa V100 32GB GPU cards.

我们只是体验一下训练,飞桨这块没有提供预训练程序,我们用的finetun的程序,只是没有调用预训练模型罢了。

In [ ]

  1. print("开始训练,预计时间2.2小时...")
  2. !cd ~/BEiT/ && sh run_train.sh

3、使用预训练模型精调

调用预训练模型进行精调训练,一般再训练10个左右Epoch即可。在本food数据集,训练5个Epoch之后,精度已经达到Avg Acc@1: 0.9531 。20个Epoch精度达到0.9860!可见BEiT模型真是竞赛的利器啊!

2022-05-11 09:04:45,478 MASTER_LOG Step[0000/0016], Avg Loss: 0.3924, Avg Acc@1: 0.9531, Avg Acc@5: 1.0000

2022-05-11 09:54:03,719 MASTER_LOG ----- Epoch[020/020], Validation Loss: 0.2302, Validation Acc@1: 0.9860, Validation Acc@5: 1.0000, time: 9.06

In [ ]

  1. !cd ~/BEiT/ && python main_gpu_finetune.py \
  2. -cfg='./configs/finetunebeit_base_patch16_224.yaml' \
  3. -dataset='imagenet2012' \
  4. -batch_size=64 \
  5. -data_path='/home/aistudio/BEiT/aifood/' \
  6. -pretrained="/home/aistudio/data/data144298/beit_base_patch16_224_ft22kto1k.pdparams" \
  7. -amp

4、验证

将自己训练的100个epoch的模型载入进行测试,发现结果是Validation Acc@1: 0.5330, Validation Acc@5: 0.9350

将官网的预训练模型拿过来测试,发现结果是:Validation Acc@1: 0.0690, Validation Acc@5: 0.3630,

将自己finetune的模型拿过来测试,发现结果是:Validation Acc@1: 0.1130, Validation Acc@5: 0.5470

finetune之后的模型,精度这么低,有点不可思议,问题原因还在查找中。

因为存盘文件太大,这里没有再提供,需要大家运行之后生成!

In [ ]

  1. # 自己训练的100个epoch验证
  2. !cd ~/BEiT/ && python main_gpu_finetune.py \
  3. -cfg='./configs/beit_base_patch16_224.yaml' \
  4. -dataset='imagenet2012' \
  5. -batch_size=256 \
  6. -data_path='/home/aistudio/BEiT/aifood/' \
  7. -eval \
  8. -pretrained='/home/aistudio/BEiT/output/train-20220511-00-46/Epoch-100-Loss-0.9632747001647949.pdparams' \
  9. -amp

In [ ]

  1. # 官方提供的预训练模型验证
  2. !cd ~/BEiT/ && python main_gpu_finetune.py \
  3. -cfg='./configs/beit_base_patch16_224.yaml' \
  4. -dataset='imagenet2012' \
  5. -batch_size=256 \
  6. -data_path='/home/aistudio/BEiT/aifood/' \
  7. -eval \
  8. -pretrained='/home/aistudio/data/data144298/beit_base_patch16_224_ft22kto1k.pdparams' \
  9. -amp

In [ ]

  1. # finetune之后的模型进行验证
  2. !cd ~/BEiT/ && python main_gpu_finetune.py \
  3. -cfg='./configs/beit_base_patch16_224.yaml' \
  4. -dataset='imagenet2012' \
  5. -batch_size=256 \
  6. -data_path='/home/aistudio/BEiT/aifood/' \
  7. -eval \
  8. -pretrained='/home/aistudio/BEiT/output/train-20220511-09-34/Epoch-15-Loss-0.2563522930145264.pdparams' \
  9. -amp

四、分段看看BEIT的代码真颜

看到这一步的,对BEiT都是真爱!

在PaddleViT或者飞桨自监督库PASSL中,大家在终端中跑BEiT训练,总感觉神龙见首不见尾,它到底是个啥东西,论文思路是怎样用飞桨代码实现的,我们都看不见,摸不着。

为了方便代码的浏览和学习,将BEiT代码分块放在notebook的Cell中,并在每个代码段编写简单的验证代码。通过对输出shape的观察,促进我们对代码的理解。

1、Droppath和MLP函数

In [ ]

  1. import numpy as np
  2. np.random.seed(42)

In [ ]

  1. # Copyright (c) 2021 PPViT Authors. All Rights Reserved.
  2. #
  3. # Licensed under the Apache License, Version 2.0 (the "License");
  4. # you may not use this file except in compliance with the License.
  5. # You may obtain a copy of the License at
  6. #
  7. # http://www.apache.org/licenses/LICENSE-2.0
  8. #
  9. # Unless required by applicable law or agreed to in writing, software
  10. # distributed under the License is distributed on an "AS IS" BASIS,
  11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12. # See the License for the specific language governing permissions and
  13. # limitations under the License.
  14. """
  15. Droppath, reimplement from https://github.com/yueatsprograms/Stochastic_Depth
  16. """
  17. import paddle
  18. import paddle.nn as nn
  19. class DropPath(nn.Layer):
  20. """DropPath class"""
  21. def __init__(self, drop_prob=None):
  22. super().__init__()
  23. self.drop_prob = drop_prob
  24. def drop_path(self, inputs):
  25. """drop path op
  26. Args:
  27. input: tensor with arbitrary shape
  28. drop_prob: float number of drop path probability, default: 0.0
  29. training: bool, if current mode is training, default: False
  30. Returns:
  31. output: output tensor after drop path
  32. """
  33. # if prob is 0 or eval mode, return original input
  34. if self.drop_prob == 0. or not self.training:
  35. return inputs
  36. keep_prob = 1 - self.drop_prob
  37. keep_prob = paddle.to_tensor(keep_prob, dtype='float32')
  38. shape = (inputs.shape[0], ) + (1, ) * (inputs.ndim - 1) # shape=(N, 1, 1, 1)
  39. random_tensor = keep_prob + paddle.rand(shape, dtype=inputs.dtype)
  40. random_tensor = random_tensor.floor() # mask
  41. output = inputs.divide(keep_prob) * random_tensor # divide to keep same output expectation
  42. return output
  43. def forward(self, inputs):
  44. return self.drop_path(inputs)
  45. def main():
  46. tmp = paddle.to_tensor(np.random.rand(8, 16, 8, 8), dtype='float32')
  47. dp = DropPath(0.5)
  48. out = dp(tmp)
  49. print(out.shape)
  50. if __name__ == "__main__":
  51. main()

In [ ]

  1. # Copyright (c) 2021 PPViT Authors. All Rights Reserved.
  2. #
  3. # Licensed under the Apache License, Version 2.0 (the "License");
  4. # you may not use this file except in compliance with the License.
  5. # You may obtain a copy of the License at
  6. #
  7. # http://www.apache.org/licenses/LICENSE-2.0
  8. #
  9. # Unless required by applicable law or agreed to in writing, software
  10. # distributed under the License is distributed on an "AS IS" BASIS,
  11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12. # See the License for the specific language governing permissions and
  13. # limitations under the License.
  14. """
  15. BEiT in Paddle
  16. A Paddle Implementation of BEiT as described in:
  17. "BEiT: BERT Pre-Training of Image Transformers"
  18. - Paper Link: https://arxiv.org/abs/2106.08254
  19. """
  20. import math
  21. import copy
  22. from functools import partial
  23. import paddle
  24. import paddle.nn as nn
  25. import paddle.nn.functional as F
  26. # from droppath import DropPath
  27. trunc_normal_ = nn.initializer.TruncatedNormal(std=0.02)
  28. zeros_ = nn.initializer.Constant(value=0.0)
  29. ones_ = nn.initializer.Constant(value=1.0)
  30. class Mlp(nn.Layer):
  31. """MLP module
  32. MLP using nn.Linear and activation is GELU, dropout is applied.
  33. Ops: fc1 -> act -> dropout -> fc2 -> dropout
  34. """
  35. def __init__(self,
  36. in_features,
  37. hidden_features=None,
  38. out_features=None,
  39. act_layer=nn.GELU,
  40. drop=0.0):
  41. super().__init__()
  42. out_features = out_features or in_features
  43. hidden_features = hidden_features or in_features
  44. self.fc1 = nn.Linear(in_features, hidden_features)
  45. self.act = act_layer()
  46. self.fc2 = nn.Linear(hidden_features, out_features)
  47. self.drop = nn.Dropout(drop)
  48. def forward(self, x):
  49. x = self.fc1(x)
  50. x = self.act(x)
  51. x = self.drop(x)
  52. x = self.fc2(x)
  53. x = self.drop(x)
  54. return x
  55. def main():
  56. tmp = tmp = paddle.to_tensor(np.random.rand(8, 16), dtype='float32')
  57. mlp = Mlp(16, 32, 512)
  58. out = mlp(tmp)
  59. print(out.shape)
  60. if __name__ == "__main__":
  61. main()

2、PatchEmbed

In [ ]

  1. class PatchEmbed(nn.Layer):
  2. """2D Image to Patch Embedding
  3. Apply patch embeddings on input images. Embeddings is implemented using a Conv2D op.
  4. """
  5. def __init__(self,
  6. img_size=224,
  7. patch_size=16,
  8. in_chans=3,
  9. embed_dim=768,
  10. norm_layer=None,
  11. flatten=True):
  12. super().__init__()
  13. img_size = (img_size, img_size)
  14. patch_size = (patch_size, patch_size)
  15. self.img_size = img_size
  16. self.patch_size = patch_size
  17. self.grid_size = (img_size[0] // patch_size[0], img_size[1] // patch_size[1])
  18. self.num_patches = self.grid_size[0] * self.grid_size[1]
  19. self.flatten = flatten
  20. self.proj = nn.Conv2D(
  21. in_chans, embed_dim, kernel_size=patch_size, stride=patch_size
  22. )
  23. self.norm = norm_layer(embed_dim) if norm_layer else Identity()
  24. def forward(self, x):
  25. B, C, H, W = x.shape
  26. assert (
  27. H == self.img_size[0] and W == self.img_size[1]
  28. ), f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})"
  29. x = self.proj(x)
  30. # print(x.shape)
  31. if self.flatten:
  32. x = x.flatten(2).transpose((0, 2, 1)) # BCHW -> BNC
  33. # print(x.shape)
  34. x = self.norm(x)
  35. return x
  36. class Identity(nn.Layer):
  37. """Identity layer
  38. The output of this layer is the input without any change.
  39. Use this layer to avoid if condition in some forward methods
  40. """
  41. def forward(self, inputs):
  42. return inputs
  43. def main():
  44. import numpy as np
  45. tmp = paddle.to_tensor(np.random.rand(16, 3, 224, 224), dtype=paddle.float32)
  46. # print(tmp.shape, tmp.size)
  47. patchembed = PatchEmbed(flatten=True)
  48. out = patchembed(tmp)
  49. print(out.shape)
  50. if __name__ == "__main__":
  51. main()

3、开始Attention模块

In [ ]

  1. class Attention(nn.Layer):
  2. """Attention Layer"""
  3. def __init__(self,
  4. dim,
  5. num_heads=8,
  6. qkv_bias=False,
  7. attn_drop=0.0,
  8. proj_drop=0.0,
  9. window_size=None,
  10. attn_head_dim=None):
  11. super().__init__()
  12. self.num_heads = num_heads
  13. head_dim = dim // num_heads
  14. if attn_head_dim is not None:
  15. head_dim = attn_head_dim
  16. all_head_dim = head_dim * self.num_heads
  17. self.scale = head_dim ** -0.5
  18. self.qkv = nn.Linear(dim, all_head_dim * 3, bias_attr=False)
  19. if qkv_bias:
  20. self.q_bias = paddle.create_parameter(
  21. shape=[all_head_dim], dtype="float32", default_initializer=zeros_
  22. )
  23. self.v_bias = paddle.create_parameter(
  24. shape=[all_head_dim], dtype="float32", default_initializer=zeros_
  25. )
  26. else:
  27. self.q_bias = None
  28. self.v_bias = None
  29. if window_size:
  30. self.window_size = window_size
  31. self.num_relative_distance = (2 * window_size[0] - 1) * (
  32. 2 * window_size[1] - 1
  33. ) + 3
  34. self.relative_position_bias_table = paddle.create_parameter(
  35. shape=[self.num_relative_distance, num_heads],
  36. dtype="float32",
  37. default_initializer=zeros_,
  38. ) # 2*Wh-1 * 2*Ww-1, nH
  39. # cls to token & token 2 cls & cls to cls
  40. # get pair-wise relative position index for each token inside the window
  41. coords_h = paddle.arange(window_size[0])
  42. coords_w = paddle.arange(window_size[1])
  43. coords = paddle.stack(paddle.meshgrid([coords_h, coords_w])) # 2, Wh, Ww
  44. coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww
  45. relative_coords = coords_flatten.unsqueeze(
  46. axis=2
  47. ) - coords_flatten.unsqueeze(
  48. axis=1
  49. ) # 2, Wh*Ww, Wh*Ww #??
  50. relative_coords = relative_coords.transpose([1, 2, 0]) # Wh*Ww, Wh*Ww, 2
  51. # print(f"relative_coords[:, :, 0] relative_coords.shape{relative_coords.shape}window_size[0] - 1{window_size[0] - 1}")
  52. # print(f"==relative_coords type:{relative_coords.dtype}")
  53. relative_coords[:, :, 0] += window_size[0] - 1 # shift to start from 0
  54. relative_coords[:, :, 1] += window_size[1] - 1
  55. relative_coords[:, :, 0] *= 2 * window_size[1] - 1
  56. relative_position_index = paddle.zeros(
  57. [
  58. window_size[0] * window_size[1] + 1,
  59. window_size[0] * window_size[1] + 1,
  60. ],
  61. dtype=relative_coords.dtype,
  62. )
  63. # Wh*Ww, Wh*Ww
  64. relative_position_index[1:, 1:] = relative_coords.sum(-1)
  65. relative_position_index[0, 0:] = self.num_relative_distance - 3
  66. relative_position_index[0:, 0] = self.num_relative_distance - 2
  67. relative_position_index[0, 0] = self.num_relative_distance - 1
  68. # print(f"==relative_position_index .stop_gradient:{relative_position_index.stop_gradient}")
  69. self.register_buffer("relative_position_index", relative_position_index)
  70. # print(f"==relative_position_index .stop_gradient:{relative_position_index.stop_gradient}")
  71. else:
  72. self.window_size = None
  73. self.relative_position_bias_table = None
  74. self.relative_position_index = None
  75. self.attn_drop = nn.Dropout(attn_drop)
  76. self.proj = nn.Linear(all_head_dim, dim)
  77. self.proj_drop = nn.Dropout(proj_drop)
  78. def forward(self, x, rel_pos_bias):
  79. B, N, C = x.shape
  80. qkv_bias = None
  81. if self.q_bias is not None:
  82. # print(f"==concat {self.q_bias.shape, paddle.zeros_like(self.v_bias).shape, self.v_bias.shape}")
  83. qkv_bias = paddle.concat(
  84. (self.q_bias, paddle.zeros_like(self.v_bias), self.v_bias)
  85. )
  86. # print(f"==qkv = mslinear {x.shape, self.qkv.weight.shape}")
  87. qkv = F.linear(x=x, weight=self.qkv.weight, bias=qkv_bias)
  88. # print(f"==paddle.shape(x)[0]{paddle.shape(x), paddle.shape(x)[0]}")
  89. qkv = qkv.reshape([paddle.shape(x)[0], paddle.shape(x)[1], 3, self.num_heads, -1]).transpose([2, 0, 3, 1, 4])
  90. #qkv = qkv.reshape([B, N, 3, self.num_heads, -1]).transpose([2, 0, 3, 1, 4])
  91. # make torchscript happy (cannot use tensor as tuple)
  92. q, k, v = qkv[0], qkv[1], qkv[2]
  93. q = q * self.scale
  94. # print("==q k:", q.shape, k.shape)
  95. attn = q @ k.transpose([0, 1, 3, 2])
  96. if self.relative_position_bias_table is not None:
  97. relative_position_bias = self.relative_position_bias_table[
  98. self.relative_position_index.reshape([-1])
  99. ].reshape(
  100. [
  101. self.window_size[0] * self.window_size[1] + 1,
  102. self.window_size[0] * self.window_size[1] + 1,
  103. -1,
  104. ]
  105. ) # Wh*Ww,Wh*Ww,nH
  106. relative_position_bias = relative_position_bias.transpose(
  107. [2, 0, 1]
  108. ) # nH, Wh*Ww, Wh*Ww
  109. attn = attn + relative_position_bias.unsqueeze(axis=0)
  110. if rel_pos_bias is not None:
  111. attn = attn + rel_pos_bias
  112. attn = F.softmax(attn, axis=-1)
  113. attn = self.attn_drop(attn)
  114. x = (attn @ v).transpose([0, 2, 1, 3]).reshape([paddle.shape(x)[0], paddle.shape(x)[1], -1])
  115. x = self.proj(x)
  116. x = self.proj_drop(x)
  117. return x
  118. def main():
  119. import numpy as np
  120. tmp = paddle.to_tensor(np.random.rand(196, 16, 768), dtype=paddle.float32)
  121. # print(tmp.shape, tmp.size)
  122. attention = Attention(dim=768 )
  123. out = attention(tmp, rel_pos_bias=0.1)
  124. print(out.shape)
  125. if __name__ == "__main__":
  126. main()

4、Block类

In [ ]

  1. class Block(nn.Layer):
  2. def __init__(self,
  3. dim,
  4. num_heads,
  5. mlp_ratio=4.0,
  6. qkv_bias=False,
  7. drop=0.0,
  8. attn_drop=0.0,
  9. drop_path=0.0,
  10. init_values=None,
  11. act_layer=nn.GELU,
  12. norm_layer=nn.LayerNorm,
  13. window_size=None,
  14. attn_head_dim=None):
  15. super().__init__()
  16. self.norm1 = norm_layer(dim)
  17. self.attn = Attention(
  18. dim,
  19. num_heads=num_heads,
  20. qkv_bias=qkv_bias,
  21. attn_drop=attn_drop,
  22. proj_drop=drop,
  23. window_size=window_size,
  24. attn_head_dim=attn_head_dim,
  25. )
  26. self.drop_path = DropPath(drop_path) if drop_path > 0.0 else Identity()
  27. self.norm2 = norm_layer(dim)
  28. mlp_hidden_dim = int(dim * mlp_ratio)
  29. self.mlp = Mlp(
  30. in_features=dim,
  31. hidden_features=mlp_hidden_dim,
  32. act_layer=act_layer,
  33. drop=drop,
  34. )
  35. if init_values:
  36. self.gamma_1 = paddle.create_parameter(
  37. shape=[dim],
  38. dtype="float32",
  39. default_initializer=nn.initializer.Constant(value=init_values),
  40. )
  41. self.gamma_2 = paddle.create_parameter(
  42. shape=[dim],
  43. dtype="float32",
  44. default_initializer=nn.initializer.Constant(value=init_values),
  45. )
  46. else:
  47. self.gamma_1, self.gamma_2 = None, None
  48. def forward(self, x, rel_pos_bias):
  49. if self.gamma_1 is None:
  50. x = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias))
  51. x = x + self.drop_path(self.mlp(self.norm2(x)))
  52. else:
  53. x = x + self.drop_path(
  54. self.gamma_1 * self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias)
  55. )
  56. x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x)))
  57. return x
  58. def main():
  59. import numpy as np
  60. tmp = paddle.to_tensor(np.random.rand(196, 16, 768), dtype=paddle.float32)
  61. # print(tmp.shape, tmp.size)
  62. block = Block(dim=768, num_heads=12 )
  63. out = block(tmp, rel_pos_bias=0.1)
  64. print(out.shape)
  65. if __name__ == "__main__":
  66. main()

5、RelativePositionBias

在本项目中,这个类没有调用。

In [ ]

  1. class RelativePositionBias(nn.Layer):
  2. def __init__(self, window_size, num_heads):
  3. super().__init__()
  4. self.window_size = window_size
  5. self.num_relative_distance = (2 * window_size[0] - 1) * (
  6. 2 * window_size[1] - 1
  7. ) + 3
  8. self.relative_position_bias_table = paddle.create_parameter(
  9. shape=[self.num_relative_distance, num_heads],
  10. dtype="float32",
  11. default_initializer=zeros_,
  12. ) # 2*Wh-1 * 2*Ww-1, nH
  13. # cls to token & token 2 cls & cls to cls
  14. # get pair-wise relative position index for each token inside the window
  15. coords_h = paddle.arange(window_size[0])
  16. coords_w = paddle.arange(window_size[1])
  17. coords = paddle.stack(paddle.meshgrid([coords_h, coords_w])) # 2, Wh, Ww
  18. coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww
  19. relative_coords = coords_flatten.unsqueeze(axis=2) - coords_flatten.unsqueeze(
  20. axis=1
  21. ) # 2, Wh*Ww, Wh*Ww
  22. relative_coords = relative_coords.transpose([1, 2, 0]) # Wh*Ww, Wh*Ww, 2
  23. relative_coords[:, :, 0] += window_size[0] - 1 # shift to start from 0
  24. relative_coords[:, :, 1] += window_size[1] - 1
  25. relative_coords[:, :, 0] *= 2 * window_size[1] - 1
  26. relative_position_index = paddle.zeros(
  27. [window_size[0] * window_size[1] + 1, window_size[0] * window_size[1] + 1]
  28. )
  29. relative_position_index[1:, 1:] = relative_coords.sum(-1) # Wh*Ww, Wh*Ww
  30. relative_position_index[0, 0:] = self.num_relative_distance - 3
  31. relative_position_index[0:, 0] = self.num_relative_distance - 2
  32. relative_position_index[0, 0] = self.num_relative_distance - 1
  33. self.register_buffer("relative_position_index", relative_position_index)
  34. # trunc_normal_(self.relative_position_bias_table, std=.02)
  35. def forward(self):
  36. relative_position_bias = self.relative_position_bias_table[
  37. self.relative_position_index.reshape([-1])].reshape(
  38. self.window_size[0] * self.window_size[1] + 1,
  39. self.window_size[0] * self.window_size[1] + 1, -1) # Wh*Ww,Wh*Ww,nH
  40. return relative_position_bias.transpose([2, 0, 1]) # nH, Wh*Ww, Wh*Ww

6、Beit类

In [ ]

  1. class Beit(nn.Layer):
  2. """Beit Layer"""
  3. def __init__(self,
  4. img_size=224,
  5. patch_size=16,
  6. in_chans=3,
  7. num_classes=1000,
  8. embed_dim=768,
  9. depth=12,
  10. num_heads=12,
  11. mlp_ratio=4.0,
  12. qkv_bias=True,
  13. drop_rate=0.0,
  14. attn_drop_rate=0.0,
  15. drop_path_rate=0.0,
  16. norm_layer=partial(nn.LayerNorm, epsilon=1e-6),
  17. init_values=None,
  18. use_abs_pos_emb=True,
  19. use_rel_pos_bias=False,
  20. use_shared_rel_pos_bias=False,
  21. use_mean_pooling=True,
  22. init_scale=0.001):
  23. super().__init__()
  24. self.num_classes = num_classes
  25. # num_features for consistency with other models
  26. self.num_features = self.embed_dim = embed_dim
  27. self.patch_embed = PatchEmbed(
  28. img_size=img_size,
  29. patch_size=patch_size,
  30. in_chans=in_chans,
  31. embed_dim=embed_dim,
  32. )
  33. num_patches = self.patch_embed.num_patches
  34. self.cls_token = paddle.create_parameter(
  35. shape=[1, 1, embed_dim],
  36. dtype="float32",
  37. default_initializer=trunc_normal_,
  38. )
  39. if use_abs_pos_emb:
  40. self.pos_embed = paddle.create_parameter(
  41. shape=[1, num_patches + 1, embed_dim],
  42. dtype="float32",
  43. default_initializer=trunc_normal_,
  44. )
  45. else:
  46. self.pos_embed = None
  47. self.pos_drop = nn.Dropout(p=drop_rate)
  48. if use_shared_rel_pos_bias:
  49. self.rel_pos_bias = RelativePositionBias(
  50. window_size=self.patch_embed.grid_size, num_heads=num_heads
  51. )
  52. else:
  53. self.rel_pos_bias = None
  54. # stochastic depth decay rule
  55. dpr = [x.item() for x in paddle.linspace(0, drop_path_rate, depth)]
  56. self.use_rel_pos_bias = use_rel_pos_bias
  57. self.blocks = nn.LayerList(
  58. [
  59. Block(
  60. dim=embed_dim,
  61. num_heads=num_heads,
  62. mlp_ratio=mlp_ratio,
  63. qkv_bias=qkv_bias,
  64. drop=drop_rate,
  65. attn_drop=attn_drop_rate,
  66. drop_path=dpr[i],
  67. norm_layer=norm_layer,
  68. init_values=init_values,
  69. window_size=self.patch_embed.grid_size if use_rel_pos_bias else None,
  70. )
  71. for i in range(depth)
  72. ]
  73. )
  74. self.norm = Identity() if use_mean_pooling else norm_layer(embed_dim)
  75. self.fc_norm = norm_layer(embed_dim) if use_mean_pooling else None
  76. self.head = nn.Linear(embed_dim, num_classes) if num_classes > 0 else Identity()
  77. self.apply(self._init_weights)
  78. self.fix_init_weight()
  79. if isinstance(self.head, nn.Linear):
  80. trunc_normal_(self.head.weight)
  81. self.head.weight.set_value(
  82. self.head.weight.multiply(paddle.to_tensor(init_scale))
  83. )
  84. self.head.bias.set_value(
  85. self.head.bias.multiply(paddle.to_tensor(init_scale))
  86. )
  87. def fix_init_weight(self):
  88. def rescale(param, layer_id):
  89. param.set_value(param.divide(paddle.to_tensor(math.sqrt(2.0 * layer_id))))
  90. for layer_id, layer in enumerate(self.blocks):
  91. rescale(layer.attn.proj.weight, layer_id + 1)
  92. rescale(layer.mlp.fc2.weight, layer_id + 1)
  93. def _init_weights(self, m):
  94. if isinstance(m, nn.Linear):
  95. trunc_normal_(m.weight)
  96. if isinstance(m, nn.Linear) and m.bias is not None:
  97. zeros_(m.bias)
  98. elif isinstance(m, nn.LayerNorm):
  99. zeros_(m.bias)
  100. ones_(m.weight)
  101. def get_num_layers(self):
  102. return len(self.blocks)
  103. def get_classifier(self):
  104. return self.head
  105. def reset_classifier(self, num_classes):
  106. self.num_classes = num_classes
  107. self.head = (
  108. nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else Identity()
  109. )
  110. def forward_features(self, x):
  111. x = self.patch_embed(x)
  112. batch_size, seq_len, _ = x.shape
  113. #cls_tokens = self.cls_token.expand([batch_size, 1, self.embed_dim])
  114. cls_tokens = self.cls_token.expand([paddle.shape(x)[0], 1, self.embed_dim])
  115. #cls_tokens = self.cls_token.expand([batch_size, -1, -1])
  116. x = paddle.concat((cls_tokens, x), axis=1)
  117. if self.pos_embed is not None:
  118. x = x + self.pos_embed
  119. x = self.pos_drop(x)
  120. rel_pos_bias = self.rel_pos_bias() if self.rel_pos_bias is not None else None
  121. for blk in self.blocks:
  122. x = blk(x, rel_pos_bias=rel_pos_bias)
  123. x = self.norm(x)
  124. if self.fc_norm is not None:
  125. t = x[:, 1:, :]
  126. return self.fc_norm(t.mean(1))
  127. return x[:, 0]
  128. def forward(self, x):
  129. x = self.forward_features(x)
  130. x = self.head(x)
  131. return x
  132. def build_beit(config):
  133. """ build beit from config"""
  134. model = Beit(
  135. img_size=config.DATA.IMAGE_SIZE,
  136. num_classes=config.MODEL.NUM_CLASSES,
  137. patch_size=config.MODEL.PATCH_SIZE,
  138. embed_dim=config.MODEL.EMBED_DIM,
  139. depth=config.MODEL.DEPTH,
  140. num_heads=config.MODEL.NUM_HEADS,
  141. mlp_ratio=config.MODEL.MLP_RATIO,
  142. use_abs_pos_emb=config.MODEL.USE_ABS_POS_EMB,
  143. use_rel_pos_bias=config.MODEL.USE_REL_POS_BIAS,
  144. init_values=config.MODEL.INIT_VALUES,
  145. qkv_bias=config.MODEL.QKV_BIAS,
  146. )
  147. return model

7、读取配置文件

In [ ]

!pip install yacs -q

In [ ]

  1. # Copyright (c) 2021 PPViT Authors. All Rights Reserved.
  2. #
  3. # Licensed under the Apache License, Version 2.0 (the "License");
  4. # you may not use this file except in compliance with the License.
  5. # You may obtain a copy of the License at
  6. #
  7. # http://www.apache.org/licenses/LICENSE-2.0
  8. #
  9. # Unless required by applicable law or agreed to in writing, software
  10. # distributed under the License is distributed on an "AS IS" BASIS,
  11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12. # See the License for the specific language governing permissions and
  13. # limitations under the License.
  14. """Configuration
  15. Configurations for (1) data processing, (2) model archtecture, and (3) training settings, etc.
  16. Config can be set by .yaml file or by argparser
  17. """
  18. import os
  19. from yacs.config import CfgNode as CN
  20. import yaml
  21. _C = CN()
  22. _C.BASE = ['']
  23. # data settings
  24. _C.DATA = CN()
  25. _C.DATA.BATCH_SIZE = 256 # train batch_size on single GPU
  26. _C.DATA.BATCH_SIZE_EVAL = None # (disabled in update_config) val batch_size on single GPU
  27. _C.DATA.DATA_PATH = '/dataset/imagenet/' # path to dataset
  28. _C.DATA.DATASET = 'imagenet2012' # dataset name, currently only support imagenet2012
  29. _C.DATA.IMAGE_SIZE = 224 # input image size e.g., 224
  30. _C.DATA.SECOND_IMAGE_SIZE = 112 # 2nd input image size e.g., 112
  31. _C.DATA.IMAGE_CHANNELS = 3 # input image channels: e.g., 3
  32. _C.DATA.CROP_PCT = 0.875 # input image scale ratio, scale is applied before centercrop in eval mode
  33. _C.DATA.NUM_WORKERS = 1 # number of data loading threads
  34. _C.DATA.IMAGENET_MEAN = [0.5, 0.5, 0.5] # [0.485, 0.456, 0.406] # imagenet mean values
  35. _C.DATA.IMAGENET_STD = [0.5, 0.5, 0.5] # [0.229, 0.224, 0.225] # imagenet std values
  36. # model general settings
  37. _C.MODEL = CN()
  38. _C.MODEL.TYPE = 'beit'
  39. _C.MODEL.VAE_TYPE = 'dall-e'
  40. _C.MODEL.NAME = 'beit'
  41. _C.MODEL.RESUME = None # full model path for resume training
  42. _C.MODEL.PRETRAINED = None # full model path for finetuning
  43. _C.MODEL.NUM_CLASSES = 10 # num of classes for classifier # 1000
  44. _C.MODEL.DROPOUT = 0.0
  45. _C.MODEL.ATTENTION_DROPOUT = 0.0
  46. _C.MODEL.DROPPATH = 0.1
  47. # model transformer settings
  48. _C.MODEL.PATCH_SIZE = 16
  49. _C.MODEL.EMBED_DIM = 768
  50. _C.MODEL.NUM_HEADS = 12
  51. _C.MODEL.ATTN_HEAD_SIZE = None # if None, use embed_dim // num_heads as head dim
  52. _C.MODEL.DEPTH = 12
  53. _C.MODEL.QK_SCALE = None
  54. _C.MODEL.QKV_BIAS = True
  55. _C.MODEL.MLP_RATIO = 4.0 # for cait class_token ratio also set to MLP_RATIO
  56. _C.MODEL.USE_ABS_POS_EMB = False
  57. _C.MODEL.USE_REL_POS_BIAS = True
  58. _C.MODEL.INIT_VALUES = 1e-4
  59. # training settings
  60. _C.TRAIN = CN()
  61. _C.TRAIN.LAST_EPOCH = 0
  62. _C.TRAIN.NUM_EPOCHS = 100
  63. _C.TRAIN.WARMUP_EPOCHS = 20
  64. _C.TRAIN.WEIGHT_DECAY = 0.05
  65. _C.TRAIN.LAYER_DECAY = 0.65
  66. _C.TRAIN.BASE_LR = 4e-3
  67. _C.TRAIN.WARMUP_START_LR = 0.0
  68. _C.TRAIN.END_LR = 1e-6
  69. _C.TRAIN.GRAD_CLIP = None
  70. _C.TRAIN.ACCUM_ITER = 1
  71. _C.TRAIN.LINEAR_SCALED_LR = 512
  72. # optimizer
  73. _C.TRAIN.OPTIMIZER = CN()
  74. _C.TRAIN.OPTIMIZER.NAME = 'AdamWDL'
  75. _C.TRAIN.OPTIMIZER.EPS = 1e-8
  76. _C.TRAIN.OPTIMIZER.BETAS = (0.9, 0.999)
  77. # model ema
  78. _C.TRAIN.MODEL_EMA = True
  79. _C.TRAIN.MODEL_EMA_DECAY = 0.9999
  80. _C.TRAIN.MODEL_EMA_FORCE_CPU = False
  81. # data augmentation (optional, check datasets.py)
  82. _C.TRAIN.SMOOTHING = 0.1
  83. _C.TRAIN.COLOR_JITTER = 0.4 # if both auto augment and rand augment are False, use color jitter
  84. _C.TRAIN.AUTO_AUGMENT = False # rand augment is used if both rand and auto augment are set True
  85. _C.TRAIN.RAND_AUGMENT = True
  86. _C.TRAIN.RAND_AUGMENT_LAYERS = 2
  87. _C.TRAIN.RAND_AUGMENT_MAGNITUDE = 9 # scale from 0 to 9
  88. # mixup params (optional, check datasets.py)
  89. _C.TRAIN.MIXUP_ALPHA = 0.8
  90. _C.TRAIN.MIXUP_PROB = 1.0
  91. _C.TRAIN.MIXUP_SWITCH_PROB = 0.5
  92. _C.TRAIN.MIXUP_MODE = 'batch'
  93. _C.TRAIN.CUTMIX_ALPHA = 1.0
  94. _C.TRAIN.CUTMIX_MINMAX = None
  95. # random erase params (optional, check datasets.py)
  96. _C.TRAIN.RANDOM_ERASE_PROB = 0.25
  97. _C.TRAIN.RANDOM_ERASE_MODE = 'pixel'
  98. _C.TRAIN.RANDOM_ERASE_COUNT = 1
  99. _C.TRAIN.RANDOM_ERASE_SPLIT = False
  100. # misc
  101. _C.SAVE = "./output" # output folder, saves logs and weights
  102. _C.SAVE_FREQ = 15 # freq to save chpt
  103. _C.REPORT_FREQ = 20 # freq to logging info
  104. _C.VALIDATE_FREQ = 1 # freq to do validation
  105. _C.SEED = 0 # random seed
  106. _C.EVAL = False # run evaluation only
  107. _C.AMP = False # auto mix precision training
  108. def _update_config_from_file(config, cfg_file):
  109. """Load cfg file (.yaml) and update config object
  110. Args:
  111. config: config object
  112. cfg_file: config file (.yaml)
  113. Return:
  114. None
  115. """
  116. config.defrost()
  117. with open(cfg_file, 'r') as infile:
  118. yaml_cfg = yaml.load(infile, Loader=yaml.FullLoader)
  119. for cfg in yaml_cfg.setdefault('BASE', ['']):
  120. if cfg:
  121. _update_config_from_file(
  122. config, os.path.join(os.path.dirname(cfg_file), cfg)
  123. )
  124. config.merge_from_file(cfg_file)
  125. config.freeze()
  126. def update_config(config, args):
  127. """Update config by ArgumentParser
  128. Configs that are often used can be updated from arguments
  129. Args:
  130. args: ArgumentParser contains options
  131. Return:
  132. config: updated config
  133. """
  134. if args.cfg:
  135. _update_config_from_file(config, args.cfg)
  136. config.defrost()
  137. if args.dataset:
  138. config.DATA.DATASET = args.dataset
  139. if args.batch_size:
  140. config.DATA.BATCH_SIZE = args.batch_size
  141. config.DATA.BATCH_SIZE_EVAL = args.batch_size
  142. if args.batch_size_eval:
  143. config.DATA.BATCH_SIZE_EVAL = args.batch_size_eval
  144. if args.image_size:
  145. config.DATA.IMAGE_SIZE = args.image_size
  146. if args.accum_iter:
  147. config.TRAIN.ACCUM_ITER = args.accum_iter
  148. if args.data_path:
  149. config.DATA.DATA_PATH = args.data_path
  150. if args.output:
  151. config.SAVE = args.output
  152. if args.eval:
  153. config.EVAL = True
  154. if args.pretrained:
  155. config.MODEL.PRETRAINED = args.pretrained
  156. if args.resume:
  157. config.MODEL.RESUME = args.resume
  158. if args.last_epoch:
  159. config.TRAIN.LAST_EPOCH = args.last_epoch
  160. if args.amp: # only for training
  161. config.AMP = not config.EVAL
  162. # config.freeze()
  163. return config
  164. def get_config(cfg_file=None):
  165. """Return a clone of config and optionally overwrite it from yaml file"""
  166. config = _C.clone()
  167. if cfg_file:
  168. _update_config_from_file(config, cfg_file)
  169. return config

8、build模型

根据args参数来创立模型,将argparse代码修改成可以在Notebook下运行。

改动部分为,将arguments的赋值函数中,加入至少一个参数即可。 arguments = parser.parse_args(['-cfg', "beit_base_patch16_224.yaml"])

In [ ]

  1. import argparse
  2. def get_arguments():
  3. """return argumeents, this will overwrite the config by (1) yaml file (2) argument values"""
  4. parser = argparse.ArgumentParser('BEiT finetune')
  5. parser.add_argument('-cfg', type=str, default=None)
  6. parser.add_argument('-dataset', type=str, default=None)
  7. parser.add_argument('-data_path', type=str, default=None)
  8. parser.add_argument('-output', type=str, default=None)
  9. parser.add_argument('-batch_size', type=int, default=None)
  10. parser.add_argument('-batch_size_eval', type=int, default=None)
  11. parser.add_argument('-image_size', type=int, default=None)
  12. parser.add_argument('-accum_iter', type=int, default=None)
  13. parser.add_argument('-pretrained', type=str, default=None)
  14. parser.add_argument('-resume', type=str, default=None)
  15. parser.add_argument('-last_epoch', type=int, default=None)
  16. parser.add_argument('-eval', action='store_true')
  17. parser.add_argument('-amp', action='store_true')
  18. arguments = parser.parse_args(['-cfg', "BEiT/beit_base_patch16_224.yaml"])
  19. # parser.parse_args['--', '42',
  20. return arguments
  21. config = update_config(get_config(), get_arguments())
  22. # config = args[0]
  23. build_model = build_beit
  24. model = build_model(config)

9、最后一步,测试模型正向计算

使用一个随机Tensor作为模型输入,可以看到输出的shape为[8, 1000],其中8为batch_size,1000为分类值。

到此,我们的代码学习过程就圆满结束了!

In [ ]

  1. images = paddle.randn([8, 3, 224, 224])
  2. label = 2
  3. output = model(images)
  4. print(output.shape)

到此,我们的BEiT代码学习就完成了俄!

大家辛苦啦!

10、学习几个飞桨小函数

1)paddle.linspace

该OP返回一个Tensor,Tensor的值为在区间start和stop上均匀间隔的num个值,输出Tensor的长度为num。

In [ ]

  1. drop_path_rate=0.5
  2. depth = 8
  3. tmp = paddle.linspace(0, drop_path_rate, depth)
  4. print(tmp)

2)学习Linear函数

Linear函数的定义是:paddle.matmul(x,weight)+bias

通过下面的代码,可以看到两者处理方式是相等的。

In [ ]

  1. import paddle
  2. # x = paddle.randn((3, 2), dtype="float32")
  3. x = paddle.ones([3,2]) *2
  4. # x: [[-0.32342386 -1.200079 ]
  5. # [ 0.7979031 -0.90978354]
  6. # [ 0.40597573 1.8095392 ]]
  7. weight = paddle.full(shape=[2, 4], fill_value="0.5", dtype="float32", name="weight")
  8. weight = weight *4
  9. # weight: [[0.5 0.5 0.5 0.5]
  10. # [0.5 0.5 0.5 0.5]]
  11. bias = paddle.ones(shape=[4], dtype="float32", name="bias")
  12. bias = bias + 0.88
  13. # bias[:] = 0
  14. # bias: [1. 1. 1. 1.]
  15. y = paddle.nn.functional.linear(x, weight, bias)
  16. # y: [[0.23824859 0.23824859 0.23824859 0.23824859]
  17. # [0.9440598 0.9440598 0.9440598 0.9440598 ]
  18. # [2.1077576 2.1077576 2.1077576 2.1077576 ]]
  19. print(x.shape, y.shape)
  20. print(y==paddle.matmul(x,weight)+bias)

3)生成特殊的坐标张量

使用meshgrid和stack从一个数组生成数组大小的张量,并堆叠起来,,然后用flatten拍平后两维。

In [ ]

  1. window_size = [3, 4]
  2. coords_h = paddle.arange(window_size[0])
  3. coords_w = paddle.arange(window_size[1])
  4. # print(coords_h, coords_w)
  5. coords = paddle.stack(paddle.meshgrid([coords_h, coords_w])) # 2, Wh, Ww
  6. print(coords)
  7. coords_flatten = paddle.flatten(coords, 1) # 2, Wh*Ww
  8. print(coords_flatten)

坐标变量分别在axis=2和axis=1 增加维度,然后做减法,经过广播,得到一个3D的坐标变量

In [ ]

  1. relative_coords = coords_flatten.unsqueeze(
  2. axis=2
  3. ) - coords_flatten.unsqueeze(
  4. axis=1
  5. )
  6. # relative_coords = coords_flatten.unsqueeze(axis=2 )
  7. relative_coords

4)学习飞桨初始化

In [ ]

  1. import paddle
  2. import paddle.nn as nn
  3. net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
  4. def init_weights(layer):
  5. if type(layer) == nn.Linear:
  6. print('before init weight:', layer.weight.numpy())
  7. new_weight = paddle.full(shape=layer.weight.shape, dtype=layer.weight.dtype, fill_value=0.9)
  8. layer.weight.set_value(new_weight)
  9. print('after init weight:', layer.weight.numpy())
  10. net.apply(init_weights)
  11. print(net.state_dict())

5)paddle.expand

根据 shape 指定的形状扩展 x ,扩展后, x 的形状和 shape 指定的形状一致。

In [ ]

  1. import paddle
  2. data = paddle.to_tensor([1, 2, 3], dtype='int32')
  3. out = paddle.expand(data, shape=[2, 3])
  4. print(out)
  5. # [[1, 2, 3], [1, 2, 3]]

五、调试纠错

报错module 'paddlenlp.ops.optimizer' has no attribute 'AdamWDL'

第一反应就是升级PaddleNLP到最新版本,新版本确实有'AdamWDL',但是会报下面的错

报错 cannot import name 'load_dataset' from 'datasets'

  1. [2022-05-05 22:35:44,247] [ WARNING] - Detected that datasets module was imported before paddlenlp. This may cause PaddleNLP datasets to be unavalible in intranetPlease import paddlenlp before datasets module to avoid download issues
  2. ...
  3. File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlenlp/datasets/dataset.py", line 48, in <module>
  4. from datasets import load_dataset as origin_load_dataset
  5. ImportError: cannot import name 'load_dataset' from 'datasets' (/home/aistudio/BEiT/datasets.py)

搞不定啊,只好不调用paddlenlp了,把需要调用的函数单独写出来,放在tmpadam目录,import tmpadam, 然后在训练的时候,使用命令位optimizer = tmpadam.AdamWDL

后台任务log日志到tar解包之后没有显示

不知道是显示问题,还是卡住了。 用unzip命令代替,也是卡住,晕。 只好放弃后台任务模式,在notebook里面执行了,索性也就需要2个小时。不用后台任务影响也不大。

无法设置10分类

运行报错,说shape对不齐。仔细检查了配置,也没有问题。 后来发现是Mixup函数默认参数是num_classes=1000,修改代码,将num_classes=config.TRAIN.NUM_CLASSES加入进去,问题解决。

  1. if (config.TRAIN.MIXUP_PROB > 0 or config.TRAIN.CUTMIX_ALPHA > 0 or
  2. config.TRAIN.CUTMIX_MINMAX is not None):
  3. mixup_fn = Mixup(mixup_alpha=config.TRAIN.MIXUP_ALPHA,
  4. cutmix_alpha=config.TRAIN.CUTMIX_ALPHA,
  5. cutmix_minmax=config.TRAIN.CUTMIX_MINMAX,
  6. prob=config.TRAIN.MIXUP_PROB,
  7. switch_prob=config.TRAIN.MIXUP_SWITCH_PROB,
  8. mode=config.TRAIN.MIXUP_MODE,
  9. label_smoothing=config.TRAIN.SMOOTHING,
  10. num_classes=config.TRAIN.NUM_CLASSES)#

Reference

  1. @article{beit,
  2. title={{BEiT}: {BERT} Pre-Training of Image Transformers},
  3. author={Hangbo Bao and Li Dong and Furu Wei},
  4. year={2021},
  5. eprint={2106.08254},
  6. archivePrefix={arXiv},
  7. primaryClass={cs.CV}
  8. }

结束语

用飞桨,划时代!让我们荡起双桨,在AI的海洋乘风破浪!

飞桨官网:https://www.paddlepaddle.org.cn

因为水平有限,难免有不足之处,还请大家多多帮助。

作者:段春华, 网名skywalk 或 天马行空,济宁市极快软件科技有限公司的AI架构师,百度飞桨PPDE。

我在AI Studio上获得至尊等级,点亮11个徽章,来关注啊~ https://aistudio.baidu.com/aistudio/personalcenter/thirdview/141218

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/知新_RL/article/detail/479873
推荐阅读
相关标签
  

闽ICP备14008679号