当前位置:   article > 正文

MobileNetv1,v2网络详解并使用pytorch搭建MobileNetV2及基于迁移学习训练(超详细|附训练代码)

mobilenetv2

  

目录

前言

学习资料

一、MobilnetV1

二、MobileNetV2

倒残差结构:        

那么什么是relu6激活函数呢​编辑

 Linear Bottlenecks

三、MobileNetV3

SE模块:

 更新激活函数:

重新设计耗时层结构:

使用pytorch搭建MobileNetv2网络结构

3.1 model.py

3.2 train.py

3.3 predict.py

 3.4 class_indices.json

使用pytorch搭建MobileNetv3网络结构

4.1 model_v3

4.2 class_indices.json


前言

最近在完成学校暑假任务时候,推荐的b站视频中发现了一个非常好的 计算机视觉 + pytorch实战 的教程,相见恨晚,能让初学者少走很多弯路。
因此决定按着up给的教程路线:图像分类→目标检测→…一步步学习用 pytorch 实现深度学习在 cv 上的应用,并做笔记整理和总结。

up主教程给出了pytorch和tensorflow两个版本的实现,我暂时只记录pytorch版本的笔记。

参考内容来自:

up主的b站链接:霹雳吧啦Wz的个人空间-霹雳吧啦Wz个人主页-哔哩哔哩视频
up主将代码和ppt都放在了github:GitHub - WZMIAOMIAO/deep-learning-for-image-processing: deep learning for image processing including classification and object-detection etc.
up主的CSDN博客:深度学习在图像处理中的应用(tensorflow2.4以及pytorch1.10实现)_深度学习图像处理_太阳花的小绿豆的博客-CSDN博客

数据集:补充LeNet,resnet,mobilenet的出处_后来后来啊的博客-CSDN博客 


学习资料

7.1 MobileNet网络详解_哔哩哔哩_bilibili

7.1.2 MobileNetv3网络详解_哔哩哔哩_bilibili

​​​​​7.2 使用pytorch搭建MobileNetV2并基于迁移学习训练_哔哩哔哩_bilibili


Mobilenet系列模型作为当前主流的端侧轻量级模型被广泛应用,很多算法都会使用其作为backbone提取特征,这一章对Mobilenet系列模型做一个总结。

一、MobilnetV1

亮点:

  • Depthwise Convolution(大大减少运算量和参数数量)
  • 增加超参数α,β

缺点

  •         depthwise部分的卷积核容易废掉,即卷积核参数大部分为0

        MobilenetV1提出了深度可分离卷积(Depthwise Convolution),它将标准卷积分解成深度卷积以及一个1x1的卷积即逐点卷积,大幅度减少了运算量和参数量。下面看一下普通卷积和深度可分卷积的对比:

普通卷积:

深度可分离卷积:

DW卷积:对每个输入通道(输入深度)应用一个滤波器

 PW卷积:(就是普通的卷积,只不过卷积核的大小为1x1),一个简单的1×1卷积,然后被用来创建一个线性组合的输出的深度层

 两种卷积计算量对比:

普通卷积层的特征提取与特征组合一次完成并输出,而深度可分离卷积先用厚度为1的3*3的卷积核(depthwise分层卷积),再用1*1的卷积核(pointwise 卷积)调整通道数,将特征提取与特征组合分开进行。

 mobileNetV1的网络结构如下,前面的卷积层中除了第一层为标准卷积层外,其他都是深度可分离卷积(Conv dw + Conv/s1),卷积后接了一个7*7的平均池化层,之后通过全连接层,最后利用Softmax激活函数将全连接层输出归一化到0-1的一个概率值,根据概率值的高低可以得到图像的分类情况。 


二、MobileNetV2

mobilenetV2相对于V1的主要优化点为:

  • 倒残差结构:Inverted Residuals
  • Linear Bottlenecks

残差结构:        

对于倒残差结构的理解,主要在于对通道数变化(维度变化)的理解。在残差结构中,先使用 1x1 卷积实现降维,再通过 3x3 卷积提取特征,最后使用 1×1 卷积实现升维。这是一个两头大、中间小的沙漏型结构。但在倒残差结构中,先使用 1x1 卷积实现升维,再通过 3x3 的 DW 卷积(逐通道卷积)提取特征,最后使用 1×1 卷积实现降维。调换了降维和升维的顺序,并将 3×3 的标准卷积换为 DW 卷积,呈两头小、中间大的梭型结构。二者比较参见下图:


              

相对于传统的残差结构使用relu激活函数,该网络使用relu6激活函数

那么什么是relu6激活函数呢

主要是为了在移动端float16的低精度的时候,也能有很好的数值分辨率,如果对ReLu的输出值不加限制,那么输出范围就是0到正无穷,而低精度的float16无法精确描述其数值,带来精度损失。

  • ReLU和ReLU6图表对比:

在这里插入图片描述

在这里插入图片描述

  • 残差模块
    (1) 整个过程为 “压缩 - 卷积 - 扩张”,呈沙漏型;
    (2) 卷积操作为:卷积降维 (1×1) - 标准卷积提取特征 (3×3) - 卷积升维 (1×1);
    (3) 统一使用 ReLU 激活函数;
  • 倒残差模块
    (1) 整个过程为 “扩展- 卷积 - 压缩”,呈梭型;
    (2) 卷积操作为:卷积升维 (1×1) - DW卷积提取特征 (3×3) - 卷积降维 (1×1);
    (3) 使用 ReLU6 激活函数和线性激活函数。

 Linear Bottlenecks

线性瓶颈结构,就是末层卷积使用线性激活的瓶颈结构(将 ReLU 函数替换为线性函数),论文中的解释如下图

 

 这里要注意,只有stride=1且输入特征矩阵与输出特征矩阵shape相同时,才有shortcut(捷径分支)连接 

下面来看一下mbv2的模型结构:

上图t为cuda因子,c表示输出特征矩阵的深度(channel),n是bottleneck(倒残差结构)的重复次数,s是步距(针对第一层,其他为1),一个block由一系列bottleneck组成

此处为了更加明了,建议观看视频:7.1 MobileNet网络详解_哔哩哔哩_bilibili


三、MobileNetV3

mbv3的主要亮点为:

  • 更新Block(bneck):加入SE模块、更新激活函数
  • 使用NAS搜索参数(Neural Architecture Search)
  • 重新设计耗时层结构:减少第一个卷积层的核数(32->16),更新last-stage

mbv3的bneck如下图:

 (NL 代表使用非线性激活函数,并不特指)

SE模块:

在bottlenet结构中加入了SE结构,并且放在了depthwise filter之后,如下图。因为SE结构会消耗一定的时间,所以作者在含有SE的结构中,将expansion layer的channel变为原来的1/4,这样作者发现,即提高了精度,同时还没有增加时间消耗。并且SE结构放在了depthwise之后。实质为引入了一个channel级别的注意力机制,其细节如下:

 更新激活函数:

使用h-swish替换swish,swish是谷歌自家的研究成果,颇有点自卖自夸的意思,这次在其基础上,为速度进行了优化。swish与h-swish公式如下所示,由于sigmoid的计算耗时较长,特别是在移动端,这些耗时就会比较明显,所以作者使用ReLU6(x+3)/6来近似替代sigmoid,观察下图可以发现,其实相差不大的。利用ReLU有几点好处,1.可以在任何软硬件平台进行计算,2.量化的时候,它消除了潜在的精度损失,使用h-swish替换swith,在量化模式下回提高大约15%的效率,另外,h-swish在深层网络中更加明显。

重新设计耗时层结构:

(1)减少第一个卷积层的卷积核个数(32—>16),减少卷积核的个数但是准确率没变,计算量反而会降低,检测速度更快

(2)精简Last Stage

将延迟时间减少了7毫秒,这是运行时间的11%,并将操作数量减少了3000万MAdds,几乎没有损失准确性。

 MobileNetV3-Large 模型结构

NBN是不使用bn层的 SE打钩才使用注意力机制 exp size对应倒残差块刚开始1*1卷积输出的深度 out对应倒残差块最后的深度。

mbv3-small的模型结构:

 最后附上原论文实验结果:


使用pytorch搭建MobileNetv2网络结构

可参考

vision/torchvision/models/mobilenetv2.py at main · pytorch/vision (github.com)

MobileNetV2 解读 - 高峰OUC - 博客园 (cnblogs.com)


3.1 model.py

  • 定义倒残差结构,即InvertedResidual
  • 定义MobileNetv2网络结构
  1. from torch import nn
  2. import torch
  3. def _make_divisible(ch, divisor=8, min_ch=None):
  4. """
  5. This function is taken from the original tf repo.
  6. It ensures that all layers have a channel number that is divisible by 8
  7. It can be seen here:
  8. https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
  9. """
  10. if min_ch is None:
  11. min_ch = divisor #最小通道数,若为None,则为8,所以 最小为8
  12. new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor) #趋近于8的整数倍,四舍五入
  13. # Make sure that round down does not go down by more than 10%.
  14. if new_ch < 0.9 * ch:
  15. new_ch += divisor
  16. return new_ch
  17. class ConvBNReLU(nn.Sequential):
  18. def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1): #如果groups=1则为普通卷积
  19. #如果groups=in_channel,则为DW卷积
  20. padding = (kernel_size - 1) // 2 #保证输出前后的图片大小不变
  21. super(ConvBNReLU, self).__init__(
  22. nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
  23. nn.BatchNorm2d(out_channel),
  24. nn.ReLU6(inplace=True)
  25. )
  26. class InvertedResidual(nn.Module): #定义倒残差结构
  27. def __init__(self, in_channel, out_channel, stride, expand_ratio):
  28. super(InvertedResidual, self).__init__()
  29. hidden_channel = in_channel * expand_ratio #扩展因子t
  30. self.use_shortcut = stride == 1 and in_channel == out_channel #只有stride=1且输入特征矩阵与输出特征矩阵shape相同时,才有shortcut(捷径分支)连接 
  31. layers = []
  32. if expand_ratio != 1:
  33. # 1x1 pointwise conv
  34. layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1))
  35. layers.extend([
  36. # 3x3 depthwise conv
  37. ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel),
  38. # 1x1 pointwise conv(linear) #线性激活函数:y=x
  39. nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
  40. nn.BatchNorm2d(out_channel),
  41. ])
  42. self.conv = nn.Sequential(*layers)
  43. def forward(self, x):
  44. if self.use_shortcut:
  45. return x + self.conv(x)
  46. else:
  47. return self.conv(x)
  48. class MobileNetV2(nn.Module):
  49. def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8):
  50. super(MobileNetV2, self).__init__()
  51. block = InvertedResidual
  52. input_channel = _make_divisible(32 * alpha, round_nearest) #调整到8的整数倍
  53. last_channel = _make_divisible(1280 * alpha, round_nearest)
  54. inverted_residual_setting = [
  55. # t, c, n, s
  56. #t扩展因子,c输出特征矩阵深度channel,n是bottleneck的重复次数,s是步距(针对第一层,其他为1)
  57. [1, 16, 1, 1],
  58. [6, 24, 2, 2],
  59. [6, 32, 3, 2],
  60. [6, 64, 4, 2],
  61. [6, 96, 3, 1],
  62. [6, 160, 3, 2],
  63. [6, 320, 1, 1],
  64. ] #每个表格中对应的参数
  65. features = []
  66. # conv1 layer
  67. features.append(ConvBNReLU(3, input_channel, stride=2))
  68. # building inverted residual residual blockes
  69. for t, c, n, s in inverted_residual_setting:
  70. output_channel = _make_divisible(c * alpha, round_nearest)
  71. for i in range(n):
  72. stride = s if i == 0 else 1
  73. features.append(block(input_channel, output_channel, stride, expand_ratio=t))
  74. input_channel = output_channel
  75. # building last several layers
  76. features.append(ConvBNReLU(input_channel, last_channel, 1))
  77. # combine feature layers
  78. self.features = nn.Sequential(*features) #已经定义完特征结构
  79. # building classifier
  80. self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) #自适应下采样操作
  81. self.classifier = nn.Sequential(
  82. nn.Dropout(0.2),
  83. nn.Linear(last_channel, num_classes)
  84. )
  85. # weight initialization
  86. for m in self.modules():
  87. if isinstance(m, nn.Conv2d):
  88. nn.init.kaiming_normal_(m.weight, mode='fan_out')
  89. if m.bias is not None:
  90. nn.init.zeros_(m.bias)
  91. elif isinstance(m, nn.BatchNorm2d):
  92. nn.init.ones_(m.weight)
  93. nn.init.zeros_(m.bias)
  94. elif isinstance(m, nn.Linear):
  95. nn.init.normal_(m.weight, 0, 0.01)
  96. nn.init.zeros_(m.bias)
  97. def forward(self, x):
  98. x = self.features(x) #特征提取
  99. x = self.avgpool(x) #平均池化下采样
  100. x = torch.flatten(x, 1) #展平处理
  101. x = self.classifier(x) #分类器输出
  102. return x

3.2 train.py

由于MobileNetv2网络较深,直接训练的话会非常耗时,因此用迁移学习的方法导入预训练好的模型参数:在pycharm中输入import torchvision.models.mobilenetv2,ctrl+左键mobilenetv2跳转到pytorch官方实现resnet的源码中,下载预训练的模型参数:

model_urls = { "mobilenet_v3_large": "https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth", "mobilenet_v3_small": "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth",

'mobilenet_v2':

'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth'

}

  然后在实例化网络时导入预训练的模型参数。下面是完整代码:(看过vgg的宝子们应该知道把"study删去",但是为了怕你们麻烦,我就删去了"study",具体原因可以查看vgg)

  1. import os
  2. import sys
  3. import json
  4. import torch
  5. import torch.nn as nn
  6. import torch.optim as optim
  7. from torchvision import transforms, datasets
  8. from tqdm import tqdm
  9. from model_v2 import MobileNetV2
  10. import torchvision.models.mobilenetv2
  11. def main():
  12. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  13. print("using {} device.".format(device))
  14. batch_size = 16
  15. epochs = 5
  16. data_transform = {
  17. "train": transforms.Compose([transforms.RandomResizedCrop(224),
  18. transforms.RandomHorizontalFlip(),
  19. transforms.ToTensor(),
  20. transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
  21. "val": transforms.Compose([transforms.Resize(256),
  22. transforms.CenterCrop(224),
  23. transforms.ToTensor(),
  24. transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}
  25. data_root = os.path.abspath(os.path.join(os.getcwd(), "../..")) # get data root path
  26. image_path = os.path.join(data_root, "data_set", "flower_data") # flower data set path
  27. assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
  28. train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
  29. transform=data_transform["train"])
  30. train_num = len(train_dataset)
  31. # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
  32. flower_list = train_dataset.class_to_idx
  33. cla_dict = dict((val, key) for key, val in flower_list.items())
  34. # write dict into json file
  35. json_str = json.dumps(cla_dict, indent=4)
  36. with open('class_indices.json', 'w') as json_file:
  37. json_file.write(json_str)
  38. nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
  39. print('Using {} dataloader workers every process'.format(nw))
  40. train_loader = torch.utils.data.DataLoader(train_dataset,
  41. batch_size=batch_size, shuffle=True,
  42. num_workers=nw)
  43. validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
  44. transform=data_transform["val"])
  45. val_num = len(validate_dataset)
  46. validate_loader = torch.utils.data.DataLoader(validate_dataset,
  47. batch_size=batch_size, shuffle=False,
  48. num_workers=nw)
  49. print("using {} images for training, {} images for validation.".format(train_num,
  50. val_num))
  51. # create model
  52. net = MobileNetV2(num_classes=5)
  53. # load pretrain weights
  54. # download url: https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
  55. model_weight_path = "./mobilenet_v2.pth"
  56. assert os.path.exists(model_weight_path), "file {} dose not exist.".format(model_weight_path)
  57. pre_weights = torch.load(model_weight_path, map_location='cpu')
  58. # delete classifier weights
  59. pre_dict = {k: v for k, v in pre_weights.items() if net.state_dict()[k].numel() == v.numel()} #字典类型
  60. missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
  61. # freeze features weights
  62. for param in net.features.parameters():
  63. param.requires_grad = False
  64. net.to(device)
  65. # define loss function
  66. loss_function = nn.CrossEntropyLoss()
  67. # construct an optimizer
  68. params = [p for p in net.parameters() if p.requires_grad]
  69. optimizer = optim.Adam(params, lr=0.0001)
  70. best_acc = 0.0
  71. save_path = './MobileNetV2.pth'
  72. train_steps = len(train_loader)
  73. for epoch in range(epochs):
  74. # train
  75. net.train()
  76. running_loss = 0.0
  77. train_bar = tqdm(train_loader, file=sys.stdout)
  78. for step, data in enumerate(train_bar):
  79. images, labels = data
  80. optimizer.zero_grad()
  81. logits = net(images.to(device))
  82. loss = loss_function(logits, labels.to(device))
  83. loss.backward()
  84. optimizer.step()
  85. # print statistics
  86. running_loss += loss.item()
  87. train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
  88. epochs,
  89. loss)
  90. # validate
  91. net.eval()
  92. acc = 0.0 # accumulate accurate number / epoch
  93. with torch.no_grad():
  94. val_bar = tqdm(validate_loader, file=sys.stdout)
  95. for val_data in val_bar:
  96. val_images, val_labels = val_data
  97. outputs = net(val_images.to(device))
  98. # loss = loss_function(outputs, test_labels)
  99. predict_y = torch.max(outputs, dim=1)[1]
  100. acc += torch.eq(predict_y, val_labels.to(device)).sum().item()
  101. val_bar.desc = "valid epoch[{}/{}]".format(epoch + 1,
  102. epochs)
  103. val_accurate = acc / val_num
  104. print('[epoch %d] train_loss: %.3f val_accuracy: %.3f' %
  105. (epoch + 1, running_loss / train_steps, val_accurate))
  106. if val_accurate > best_acc:
  107. best_acc = val_accurate
  108. torch.save(net.state_dict(), save_path)
  109. print('Finished Training')
  110. if __name__ == '__main__':
  111. main()

3.3 predict.py

预测脚本跟之前的几章差不多,就不详细讲了

  1. import os
  2. import json
  3. import torch
  4. from PIL import Image
  5. from torchvision import transforms
  6. import matplotlib.pyplot as plt
  7. from model_v2 import MobileNetV2
  8. def main():
  9. device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  10. data_transform = transforms.Compose(
  11. [transforms.Resize(256),
  12. transforms.CenterCrop(224),
  13. transforms.ToTensor(),
  14. transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
  15. # load image
  16. img_path = "../tulip.jpg"
  17. assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
  18. img = Image.open(img_path)
  19. plt.imshow(img)
  20. # [N, C, H, W]
  21. img = data_transform(img)
  22. # expand batch dimension
  23. img = torch.unsqueeze(img, dim=0)
  24. # read class_indict
  25. json_path = './class_indices.json'
  26. assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)
  27. with open(json_path, "r") as f:
  28. class_indict = json.load(f)
  29. # create model
  30. model = MobileNetV2(num_classes=5).to(device)
  31. # load model weights
  32. model_weight_path = "./MobileNetV2.pth"
  33. model.load_state_dict(torch.load(model_weight_path, map_location=device))
  34. model.eval()
  35. with torch.no_grad():
  36. # predict class
  37. output = torch.squeeze(model(img.to(device))).cpu()
  38. predict = torch.softmax(output, dim=0)
  39. predict_cla = torch.argmax(predict).numpy()
  40. print_res = "class: {} prob: {:.3}".format(class_indict[str(predict_cla)],
  41. predict[predict_cla].numpy())
  42. plt.title(print_res)
  43. for i in range(len(predict)):
  44. print("class: {:10} prob: {:.3}".format(class_indict[str(i)],
  45. predict[i].numpy()))
  46. plt.show()
  47. if __name__ == '__main__':
  48. main()

 3.4 class_indices.json

  1. {
  2. "0": "daisy",
  3. "1": "dandelion",
  4. "2": "roses",
  5. "3": "sunflowers",
  6. "4": "tulips"
  7. }


使用pytorch搭建MobileNetv3网络结构

差不多和上面同样的道理,我把path复制下里,其他的查看上文即可

model_urls = { "mobilenet_v3_large": "https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth", "mobilenet_v3_small": "https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth",

'mobilenet_v2':

'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth'

}

4.1 model_v3

  1. from typing import Callable, List, Optional
  2. import torch
  3. from torch import nn, Tensor
  4. from torch.nn import functional as F
  5. from functools import partial
  6. def _make_divisible(ch, divisor=8, min_ch=None):
  7. """
  8. This function is taken from the original tf repo.
  9. It ensures that all layers have a channel number that is divisible by 8
  10. It can be seen here:
  11. https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
  12. """
  13. if min_ch is None:
  14. min_ch = divisor
  15. new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
  16. # Make sure that round down does not go down by more than 10%.
  17. if new_ch < 0.9 * ch:
  18. new_ch += divisor
  19. return new_ch
  20. class ConvBNActivation(nn.Sequential):
  21. def __init__(self,
  22. in_planes: int,
  23. out_planes: int,
  24. kernel_size: int = 3,
  25. stride: int = 1,
  26. groups: int = 1,
  27. norm_layer: Optional[Callable[..., nn.Module]] = None,
  28. activation_layer: Optional[Callable[..., nn.Module]] = None):
  29. padding = (kernel_size - 1) // 2
  30. if norm_layer is None:
  31. norm_layer = nn.BatchNorm2d
  32. if activation_layer is None:
  33. activation_layer = nn.ReLU6
  34. super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
  35. out_channels=out_planes,
  36. kernel_size=kernel_size,
  37. stride=stride,
  38. padding=padding,
  39. groups=groups,
  40. bias=False),
  41. norm_layer(out_planes),
  42. activation_layer(inplace=True))
  43. class SqueezeExcitation(nn.Module):
  44. def __init__(self, input_c: int, squeeze_factor: int = 4):
  45. super(SqueezeExcitation, self).__init__()
  46. squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
  47. self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)
  48. self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)
  49. def forward(self, x: Tensor) -> Tensor:
  50. scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
  51. scale = self.fc1(scale)
  52. scale = F.relu(scale, inplace=True)
  53. scale = self.fc2(scale)
  54. scale = F.hardsigmoid(scale, inplace=True)
  55. return scale * x
  56. class InvertedResidualConfig:
  57. def __init__(self,
  58. input_c: int,
  59. kernel: int,
  60. expanded_c: int,
  61. out_c: int,
  62. use_se: bool,
  63. activation: str,
  64. stride: int,
  65. width_multi: float):
  66. self.input_c = self.adjust_channels(input_c, width_multi)
  67. self.kernel = kernel
  68. self.expanded_c = self.adjust_channels(expanded_c, width_multi)
  69. self.out_c = self.adjust_channels(out_c, width_multi)
  70. self.use_se = use_se
  71. self.use_hs = activation == "HS" # whether using h-swish activation
  72. self.stride = stride
  73. @staticmethod
  74. def adjust_channels(channels: int, width_multi: float):
  75. return _make_divisible(channels * width_multi, 8)
  76. class InvertedResidual(nn.Module):
  77. def __init__(self,
  78. cnf: InvertedResidualConfig,
  79. norm_layer: Callable[..., nn.Module]):
  80. super(InvertedResidual, self).__init__()
  81. if cnf.stride not in [1, 2]:
  82. raise ValueError("illegal stride value.")
  83. self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)
  84. layers: List[nn.Module] = []
  85. activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU
  86. # expand
  87. if cnf.expanded_c != cnf.input_c:
  88. layers.append(ConvBNActivation(cnf.input_c,
  89. cnf.expanded_c,
  90. kernel_size=1,
  91. norm_layer=norm_layer,
  92. activation_layer=activation_layer))
  93. # depthwise
  94. layers.append(ConvBNActivation(cnf.expanded_c,
  95. cnf.expanded_c,
  96. kernel_size=cnf.kernel,
  97. stride=cnf.stride,
  98. groups=cnf.expanded_c,
  99. norm_layer=norm_layer,
  100. activation_layer=activation_layer))
  101. if cnf.use_se:
  102. layers.append(SqueezeExcitation(cnf.expanded_c))
  103. # project
  104. layers.append(ConvBNActivation(cnf.expanded_c,
  105. cnf.out_c,
  106. kernel_size=1,
  107. norm_layer=norm_layer,
  108. activation_layer=nn.Identity))
  109. self.block = nn.Sequential(*layers)
  110. self.out_channels = cnf.out_c
  111. self.is_strided = cnf.stride > 1
  112. def forward(self, x: Tensor) -> Tensor:
  113. result = self.block(x)
  114. if self.use_res_connect:
  115. result += x
  116. return result
  117. class MobileNetV3(nn.Module):
  118. def __init__(self,
  119. inverted_residual_setting: List[InvertedResidualConfig],
  120. last_channel: int,
  121. num_classes: int = 1000,
  122. block: Optional[Callable[..., nn.Module]] = None,
  123. norm_layer: Optional[Callable[..., nn.Module]] = None):
  124. super(MobileNetV3, self).__init__()
  125. if not inverted_residual_setting:
  126. raise ValueError("The inverted_residual_setting should not be empty.")
  127. elif not (isinstance(inverted_residual_setting, List) and
  128. all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
  129. raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")
  130. if block is None:
  131. block = InvertedResidual
  132. if norm_layer is None:
  133. norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)
  134. layers: List[nn.Module] = []
  135. # building first layer
  136. firstconv_output_c = inverted_residual_setting[0].input_c
  137. layers.append(ConvBNActivation(3,
  138. firstconv_output_c,
  139. kernel_size=3,
  140. stride=2,
  141. norm_layer=norm_layer,
  142. activation_layer=nn.Hardswish))
  143. # building inverted residual blocks
  144. for cnf in inverted_residual_setting:
  145. layers.append(block(cnf, norm_layer))
  146. # building last several layers
  147. lastconv_input_c = inverted_residual_setting[-1].out_c
  148. lastconv_output_c = 6 * lastconv_input_c
  149. layers.append(ConvBNActivation(lastconv_input_c,
  150. lastconv_output_c,
  151. kernel_size=1,
  152. norm_layer=norm_layer,
  153. activation_layer=nn.Hardswish))
  154. self.features = nn.Sequential(*layers)
  155. self.avgpool = nn.AdaptiveAvgPool2d(1)
  156. self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),
  157. nn.Hardswish(inplace=True),
  158. nn.Dropout(p=0.2, inplace=True),
  159. nn.Linear(last_channel, num_classes))
  160. # initial weights
  161. for m in self.modules():
  162. if isinstance(m, nn.Conv2d):
  163. nn.init.kaiming_normal_(m.weight, mode="fan_out")
  164. if m.bias is not None:
  165. nn.init.zeros_(m.bias)
  166. elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
  167. nn.init.ones_(m.weight)
  168. nn.init.zeros_(m.bias)
  169. elif isinstance(m, nn.Linear):
  170. nn.init.normal_(m.weight, 0, 0.01)
  171. nn.init.zeros_(m.bias)
  172. def _forward_impl(self, x: Tensor) -> Tensor:
  173. x = self.features(x)
  174. x = self.avgpool(x)
  175. x = torch.flatten(x, 1)
  176. x = self.classifier(x)
  177. return x
  178. def forward(self, x: Tensor) -> Tensor:
  179. return self._forward_impl(x)
  180. def mobilenet_v3_large(num_classes: int = 1000,
  181. reduced_tail: bool = False) -> MobileNetV3:
  182. """
  183. Constructs a large MobileNetV3 architecture from
  184. "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.
  185. weights_link:
  186. https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth
  187. Args:
  188. num_classes (int): number of classes
  189. reduced_tail (bool): If True, reduces the channel counts of all feature layers
  190. between C4 and C5 by 2. It is used to reduce the channel redundancy in the
  191. backbone for Detection and Segmentation.
  192. """
  193. width_multi = 1.0
  194. bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
  195. adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
  196. reduce_divider = 2 if reduced_tail else 1
  197. inverted_residual_setting = [
  198. # input_c, kernel, expanded_c, out_c, use_se, activation, stride
  199. bneck_conf(16, 3, 16, 16, False, "RE", 1),
  200. bneck_conf(16, 3, 64, 24, False, "RE", 2), # C1
  201. bneck_conf(24, 3, 72, 24, False, "RE", 1),
  202. bneck_conf(24, 5, 72, 40, True, "RE", 2), # C2
  203. bneck_conf(40, 5, 120, 40, True, "RE", 1),
  204. bneck_conf(40, 5, 120, 40, True, "RE", 1),
  205. bneck_conf(40, 3, 240, 80, False, "HS", 2), # C3
  206. bneck_conf(80, 3, 200, 80, False, "HS", 1),
  207. bneck_conf(80, 3, 184, 80, False, "HS", 1),
  208. bneck_conf(80, 3, 184, 80, False, "HS", 1),
  209. bneck_conf(80, 3, 480, 112, True, "HS", 1),
  210. bneck_conf(112, 3, 672, 112, True, "HS", 1),
  211. bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2), # C4
  212. bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
  213. bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
  214. ]
  215. last_channel = adjust_channels(1280 // reduce_divider) # C5
  216. return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
  217. last_channel=last_channel,
  218. num_classes=num_classes)
  219. def mobilenet_v3_small(num_classes: int = 1000,
  220. reduced_tail: bool = False) -> MobileNetV3:
  221. """
  222. Constructs a large MobileNetV3 architecture from
  223. "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.
  224. weights_link:
  225. https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth
  226. Args:
  227. num_classes (int): number of classes
  228. reduced_tail (bool): If True, reduces the channel counts of all feature layers
  229. between C4 and C5 by 2. It is used to reduce the channel redundancy in the
  230. backbone for Detection and Segmentation.
  231. """
  232. width_multi = 1.0
  233. bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
  234. adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)
  235. reduce_divider = 2 if reduced_tail else 1
  236. inverted_residual_setting = [
  237. # input_c, kernel, expanded_c, out_c, use_se, activation, stride
  238. bneck_conf(16, 3, 16, 16, True, "RE", 2), # C1
  239. bneck_conf(16, 3, 72, 24, False, "RE", 2), # C2
  240. bneck_conf(24, 3, 88, 24, False, "RE", 1),
  241. bneck_conf(24, 5, 96, 40, True, "HS", 2), # C3
  242. bneck_conf(40, 5, 240, 40, True, "HS", 1),
  243. bneck_conf(40, 5, 240, 40, True, "HS", 1),
  244. bneck_conf(40, 5, 120, 48, True, "HS", 1),
  245. bneck_conf(48, 5, 144, 48, True, "HS", 1),
  246. bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2), # C4
  247. bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
  248. bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
  249. ]
  250. last_channel = adjust_channels(1024 // reduce_divider) # C5
  251. return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
  252. last_channel=last_channel,
  253. num_classes=num_classes)

train和predict均为上面,唯一需要注意的是train和predict代码中的v2应该为v3例如

train中

 把第二句改为'./MobileNetV3.pth'

4.2 class_indices.json

  1. {
  2. "0": "daisy",
  3. "1": "dandelion",
  4. "2": "roses",
  5. "3": "sunflowers",
  6. "4": "tulips"
  7. }

以上就是全部内容了,若有什么疑惑,请多看看官方解释和上面推荐的视频,看到这樂,应该是个认真向上的宝,能点个赞收藏下吗


参考链接:MobilenetV1、V2、V3系列详解_mobilenetv1和v2区别_Turned_MZ的博客-CSDN博客

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/385051
推荐阅读
相关标签
  

闽ICP备14008679号