赞
踩
google
ICML-2021
efficientNet v1(【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》) 的基础上升级, faster training speed and better parameter efficiency
作者观察到
提出 EfficentNet v2,用 training-aware neural architecture search and scaling 搜索出来,采用 progressive learning(image size 和 regularization) 进一步提升速度精度,公开数据集上更快更好
EfficientNet
(1)Training with very large image sizes is slow
(2)Depthwise convolutions are slow in early layers but effective in later stages
DW conv 虽然参数量和计算量更小,但是 cannot fully utilize modern accelerators
Fuse-MBConv 高效,精度高,但是参数量和 FLOPS 会变高
MB 和 Fuse-MB 两者在网络中如何搭配更高效呢? 作者: leverage neural architecture search to automatically search for the best combination.
(3)Equally scaling up every stage is sub-optimal
网络深度或者宽度缩放时所有 stage 一样,不是最优的
EfficientNet-S -> EfficientNet-M -> EfficientNet-L
在 v1 的基础上做的 NAS,搜出来的结构如下
EfficientNetV2-S,scale up 到 M 和 L 时,gradually add more layers to later stages (e.g., stage 5 and 6)
Training Speed Comparison:
effNet(reprod) 用了 about 30% smaller image size
in the early training epochs, we train the network with small image size and weak regularization (e.g., dropout and data augmentation), then we gradually increase image size and add stronger regularization.
创新点在于不仅 progressive image size,还有 regularization
输入尺寸越大,数据增强的程度相应的增高,效果会更好
progress learning 的策略如下
不仅 size 随着训练过程的深入在增大,regularization 强度也在递增,算法流程如下
S i S_i Si image size,最初 S 0 S_0 S0,最终 S e S_e Se
ϕ i k \phi_i^k ϕik 正则化强度,regularization magnitude,参考(【Randaugment】《Randaugment:Practical automated data augmentation with a reduced search space》 和 【AutoAgument for OD】《Learning Data Augmentation Strategies for Object Detection》),作者用的正则化技术有 Dropout、RandAugment、Mixup,最低强度 ϕ 0 k \phi_0^k ϕ0k 最高强度 ϕ e k \phi_e^k ϕek
M M M,训练过程被划分成了 M M M 个 stage,training process into four stages with about 87 epochs per stage,注意区别于主干的 stage
N N N,traning steps,可以理解为 epoch 或者每次 batch-size 的 iteration
progress learning 采用了最简单的线性增长形式,更细节的参数配置范围如下表
结果展示
又快又好
不过这个图看起来,速度优势并没有很明显,精度倒是优势很明显
用了 ImageNet21k 后作者的实验心得
CIFAR-10 还好,CIFAR-100 和 cars 领先的比较明显
training speed (reduced from 139h to 54h) and accuracy (improved from 84.7% to 85.0%) are better than the original paper
EfficientNet-v2-S 的基础上,scaling down 一些小模型,看看性能
主打的是一个快
提速明显
这个是作者的创新点之一
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。