赞
踩
近年来,通道注意机制在提高深度卷积神经网络(CNNs)性能方面发挥了巨大的潜力。然而,现有的方法大多致力于开发更复杂的注意力模块以获得更好的性能,这不可避免地增加了模型的复杂性。为了克服性能与复杂度权衡的矛盾,本文提出了一种高效通道注意力(ECA)模块,该模块只涉及少数几个参数,但却能带来明显的性能提升。通过对SENet中通道注意模块的分析,我们实证表明避免降维对于学习通道注意非常重要,适当的跨通道交互可以在显著降低模型复杂度的同时保持性能。因此,我们提出了一种不降维的局部跨信道交互策略,该策略可以通过一维卷积有效实现。此外,我们发展了一种自适应选择一维卷积核大小的方法,确定局部跨通道交互的覆盖范围。提出的ECA模块是高效而有效的,例如,我们的模块对ResNet50骨干的参数和计算是80 vs. 24.37M, 4.7e-4 GFLOPs vs. 3 GFLOPs,在Top-1准确率方面,性能提升超过2%。我们以ResNets和MobileNetV2为骨干,对ECA模块在图像分类、目标检测和实例分割方面进行了广泛的评估。实验结果表明,该模块的性能优于同类模块,效率更高。
降维虽然可以降低模型复杂度,但破坏了信道与其权值的直接对应关系。例如,单个FC层使用所有信道的线性组合来预测每个信道的权值。但是首先将信道特征投影到低维空间,然后再映射回来,使得信道与其权重之间的对应是间接的。
文章对于局部跨通道特征交互给出了很多分析,这里就直接看得到的结果:使用1D卷积
ω
=
σ
(
C
1
D
k
(
y
)
)
ω
i
=
σ
(
∑
j
=
1
k
w
j
y
i
j
)
,
y
i
j
∈
Ω
i
k
文章进一步的提出了一种自适应选择核大小的方法:
C
=
ϕ
(
k
)
=
2
(
γ
∗
k
−
b
)
k
=
ψ
(
C
)
=
∣
log
2
(
C
)
γ
+
b
γ
∣
o
d
d
!pip install paddlex
%matplotlib inline import paddle import paddle.fluid as fluid import numpy as np import matplotlib.pyplot as plt from paddle.vision.datasets import Cifar10 from paddle.vision.transforms import Transpose from paddle.io import Dataset, DataLoader from paddle import nn import paddle.nn.functional as F import paddle.vision.transforms as transforms import os import matplotlib.pyplot as plt from matplotlib.pyplot import figure import paddlex from paddle import ParamAttr
train_tfm = transforms.Compose([ transforms.Resize((130, 130)), transforms.ColorJitter(brightness=0.2,contrast=0.2, saturation=0.2), transforms.RandomResizedCrop(128, scale=(0.6, 1.0)), transforms.RandomHorizontalFlip(0.5), transforms.RandomRotation(20), paddlex.transforms.MixupImage(), transforms.ToTensor(), transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ]) test_tfm = transforms.Compose([ transforms.Resize((128, 128)), transforms.ToTensor(), transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ])
paddle.vision.set_image_backend('cv2')
# 使用Cifar10数据集
train_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='train', transform = train_tfm)
val_dataset = Cifar10(data_file='data/data152754/cifar-10-python.tar.gz', mode='test',transform = test_tfm)
print("train_dataset: %d" % len(train_dataset))
print("val_dataset: %d" % len(val_dataset))
train_dataset: 50000
val_dataset: 10000
batch_size=128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=False, num_workers=4)
class LabelSmoothingCrossEntropy(nn.Layer):
def __init__(self, smoothing=0.1):
super().__init__()
self.smoothing = smoothing
def forward(self, pred, target):
confidence = 1. - self.smoothing
log_probs = F.log_softmax(pred, axis=-1)
idx = paddle.stack([paddle.arange(log_probs.shape[0]), target], axis=1)
nll_loss = paddle.gather_nd(-log_probs, index=idx)
smooth_loss = paddle.mean(-log_probs, axis=-1)
loss = confidence * nll_loss + self.smoothing * smooth_loss
return loss.mean()
class ECA(nn.Layer): def __init__(self, k_size=3): super().__init__() self.avg_pool = nn.AdaptiveAvgPool2D(1) self.conv = nn.Conv1D(1, 1, kernel_size=k_size, padding=(k_size - 1) // 2, bias_attr=False) self.sigmoid = nn.Sigmoid() def forward(self, x): y = self.avg_pool(x) y = self.conv(y.squeeze(-1).transpose([0, 2, 1])).transpose([0, 2, 1]).unsqueeze(-1) y = self.sigmoid(y) return x * y
model = ECA(3)
paddle.summary(model, (1, 64, 224, 224))
W0811 09:59:14.083604 337 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1 W0811 09:59:14.088572 337 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6. ------------------------------------------------------------------------------- Layer (type) Input Shape Output Shape Param # =============================================================================== AdaptiveAvgPool2D-1 [[1, 64, 224, 224]] [1, 64, 1, 1] 0 Conv1D-1 [[1, 1, 64]] [1, 1, 64] 3 Sigmoid-2 [[1, 64, 1, 1]] [1, 64, 1, 1] 0 =============================================================================== Total params: 3 Trainable params: 3 Non-trainable params: 0 ------------------------------------------------------------------------------- Input size (MB): 12.25 Forward/backward pass size (MB): 0.00 Params size (MB): 0.00 Estimated Total Size (MB): 12.25 ------------------------------------------------------------------------------- {'total_params': 3, 'trainable_params': 3}
class AlexNet_ECA(nn.Layer): def __init__(self,num_classes=10): super().__init__() self.features=nn.Sequential( nn.Conv2D(3,48, kernel_size=11, stride=4, padding=11//2), ECA(3), nn.ReLU(), nn.MaxPool2D(kernel_size=3,stride=2), nn.Conv2D(48,128, kernel_size=5, padding=2), ECA(3), nn.ReLU(), nn.MaxPool2D(kernel_size=3,stride=2), nn.Conv2D(128, 192,kernel_size=3,stride=1,padding=1), ECA(5), nn.ReLU(), nn.Conv2D(192,192,kernel_size=3,stride=1,padding=1), ECA(5), nn.ReLU(), nn.Conv2D(192,128,kernel_size=3,stride=1,padding=1), ECA(3), nn.ReLU(), nn.MaxPool2D(kernel_size=3,stride=2), ) self.classifier=nn.Sequential( nn.Linear(3 * 3 * 128,2048), nn.ReLU(), nn.Dropout(0.5), nn.Linear(2048,2048), nn.ReLU(), nn.Dropout(0.5), nn.Linear(2048,num_classes), ) def forward(self,x): x = self.features(x) x = paddle.flatten(x, 1) x=self.classifier(x) return x
model = AlexNet_ECA(num_classes=10)
paddle.summary(model, (1, 3, 128, 128))
learning_rate = 0.001
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model' model = AlexNet_ECA(num_classes=10) criterion = LabelSmoothingCrossEntropy() scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False) optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5) gate = 0.0 threshold = 0.0 best_acc = 0.0 val_acc = 0.0 loss_record = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}} # for recording loss acc_record = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}} # for recording accuracy loss_iter = 0 acc_iter = 0 for epoch in range(n_epochs): # ---------- Training ---------- model.train() train_num = 0.0 train_loss = 0.0 val_num = 0.0 val_loss = 0.0 accuracy_manager = paddle.metric.Accuracy() val_accuracy_manager = paddle.metric.Accuracy() print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr())) for batch_id, data in enumerate(train_loader): x_data, y_data = data labels = paddle.unsqueeze(y_data, axis=1) logits = model(x_data) loss = criterion(logits, y_data) acc = paddle.metric.accuracy(logits, labels) accuracy_manager.update(acc) if batch_id % 10 == 0: loss_record['train']['loss'].append(loss.numpy()) loss_record['train']['iter'].append(loss_iter) loss_iter += 1 loss.backward() optimizer.step() scheduler.step() optimizer.clear_grad() train_loss += loss train_num += len(y_data) total_train_loss = (train_loss / train_num) * batch_size train_acc = accuracy_manager.accumulate() acc_record['train']['acc'].append(train_acc) acc_record['train']['iter'].append(acc_iter) acc_iter += 1 # Print the information. print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100)) # ---------- Validation ---------- model.eval() for batch_id, data in enumerate(val_loader): x_data, y_data = data labels = paddle.unsqueeze(y_data, axis=1) with paddle.no_grad(): logits = model(x_data) loss = criterion(logits, y_data) acc = paddle.metric.accuracy(logits, labels) val_accuracy_manager.update(acc) val_loss += loss val_num += len(y_data) total_val_loss = (val_loss / val_num) * batch_size loss_record['val']['loss'].append(total_val_loss.numpy()) loss_record['val']['iter'].append(loss_iter) val_acc = val_accuracy_manager.accumulate() acc_record['val']['acc'].append(val_acc) acc_record['val']['iter'].append(acc_iter) print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100)) # ===================save==================== if val_acc > best_acc: best_acc = val_acc paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams')) paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt')) print(best_acc) paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams')) paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
def plot_learning_curve(record, title='loss', ylabel='CE Loss'): ''' Plot learning curve of your CNN ''' maxtrain = max(map(float, record['train'][title])) maxval = max(map(float, record['val'][title])) ymax = max(maxtrain, maxval) * 1.1 mintrain = min(map(float, record['train'][title])) minval = min(map(float, record['val'][title])) ymin = min(mintrain, minval) * 0.9 total_steps = len(record['train'][title]) x_1 = list(map(int, record['train']['iter'])) x_2 = list(map(int, record['val']['iter'])) figure(figsize=(10, 6)) plt.plot(x_1, record['train'][title], c='tab:red', label='train') plt.plot(x_2, record['val'][title], c='tab:cyan', label='val') plt.ylim(ymin, ymax) plt.xlabel('Training steps') plt.ylabel(ylabel) plt.title('Learning curve of {}'.format(title)) plt.legend() plt.show()
plot_learning_curve(loss_record, title='loss', ylabel='CE Loss')
plot_learning_curve(acc_record, title='acc', ylabel='Accuracy')
import time
work_path = 'work/model'
model = AlexNet_ECA(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:2022
def get_cifar10_labels(labels):
"""返回CIFAR10数据集的文本标签。"""
text_labels = [
'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
'horse', 'ship', 'truck']
return [text_labels[int(i)] for i in labels]
def show_images(imgs, num_rows, num_cols, pred=None, gt=None, scale=1.5):
"""Plot a list of images."""
figsize = (num_cols * scale, num_rows * scale)
_, axes = plt.subplots(num_rows, num_cols, figsize=figsize)
axes = axes.flatten()
for i, (ax, img) in enumerate(zip(axes, imgs)):
if paddle.is_tensor(img):
ax.imshow(img.numpy())
else:
ax.imshow(img)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
if pred or gt:
ax.set_title("pt: " + pred[i] + "\ngt: " + gt[i])
return axes
work_path = 'work/model'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = AlexNet_ECA(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 128, 128, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
class AlexNet(nn.Layer): def __init__(self,num_classes=10): super().__init__() self.features=nn.Sequential( nn.Conv2D(3,48, kernel_size=11, stride=4, padding=11//2), nn.ReLU(), nn.MaxPool2D(kernel_size=3,stride=2), nn.Conv2D(48,128, kernel_size=5, padding=2), nn.ReLU(), nn.MaxPool2D(kernel_size=3,stride=2), nn.Conv2D(128, 192,kernel_size=3,stride=1,padding=1), nn.ReLU(), nn.Conv2D(192,192,kernel_size=3,stride=1,padding=1), nn.ReLU(), nn.Conv2D(192,128,kernel_size=3,stride=1,padding=1), nn.ReLU(), nn.MaxPool2D(kernel_size=3,stride=2), ) self.classifier=nn.Sequential( nn.Linear(3 * 3 * 128,2048), nn.ReLU(), nn.Dropout(0.5), nn.Linear(2048,2048), nn.ReLU(), nn.Dropout(0.5), nn.Linear(2048,num_classes), ) def forward(self,x): x = self.features(x) x = paddle.flatten(x, 1) x=self.classifier(x) return x
model = AlexNet(num_classes=10)
paddle.summary(model, (1, 3, 128, 128))
learning_rate = 0.001
n_epochs = 50
paddle.seed(42)
np.random.seed(42)
work_path = 'work/model1' model = AlexNet(num_classes=10) criterion = LabelSmoothingCrossEntropy() scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=learning_rate, T_max=50000 // batch_size * n_epochs, verbose=False) optimizer = paddle.optimizer.Adam(parameters=model.parameters(), learning_rate=scheduler, weight_decay=1e-5) gate = 0.0 threshold = 0.0 best_acc = 0.0 val_acc = 0.0 loss_record1 = {'train': {'loss': [], 'iter': []}, 'val': {'loss': [], 'iter': []}} # for recording loss acc_record1 = {'train': {'acc': [], 'iter': []}, 'val': {'acc': [], 'iter': []}} # for recording accuracy loss_iter = 0 acc_iter = 0 for epoch in range(n_epochs): # ---------- Training ---------- model.train() train_num = 0.0 train_loss = 0.0 val_num = 0.0 val_loss = 0.0 accuracy_manager = paddle.metric.Accuracy() val_accuracy_manager = paddle.metric.Accuracy() print("#===epoch: {}, lr={:.10f}===#".format(epoch, optimizer.get_lr())) for batch_id, data in enumerate(train_loader): x_data, y_data = data labels = paddle.unsqueeze(y_data, axis=1) logits = model(x_data) loss = criterion(logits, y_data) acc = paddle.metric.accuracy(logits, labels) accuracy_manager.update(acc) if batch_id % 10 == 0: loss_record1['train']['loss'].append(loss.numpy()) loss_record1['train']['iter'].append(loss_iter) loss_iter += 1 loss.backward() optimizer.step() scheduler.step() optimizer.clear_grad() train_loss += loss train_num += len(y_data) total_train_loss = (train_loss / train_num) * batch_size train_acc = accuracy_manager.accumulate() acc_record1['train']['acc'].append(train_acc) acc_record1['train']['iter'].append(acc_iter) acc_iter += 1 # Print the information. print("#===epoch: {}, train loss is: {}, train acc is: {:2.2f}%===#".format(epoch, total_train_loss.numpy(), train_acc*100)) # ---------- Validation ---------- model.eval() for batch_id, data in enumerate(val_loader): x_data, y_data = data labels = paddle.unsqueeze(y_data, axis=1) with paddle.no_grad(): logits = model(x_data) loss = criterion(logits, y_data) acc = paddle.metric.accuracy(logits, labels) val_accuracy_manager.update(acc) val_loss += loss val_num += len(y_data) total_val_loss = (val_loss / val_num) * batch_size loss_record1['val']['loss'].append(total_val_loss.numpy()) loss_record1['val']['iter'].append(loss_iter) val_acc = val_accuracy_manager.accumulate() acc_record1['val']['acc'].append(val_acc) acc_record1['val']['iter'].append(acc_iter) print("#===epoch: {}, val loss is: {}, val acc is: {:2.2f}%===#".format(epoch, total_val_loss.numpy(), val_acc*100)) # ===================save==================== if val_acc > best_acc: best_acc = val_acc paddle.save(model.state_dict(), os.path.join(work_path, 'best_model.pdparams')) paddle.save(optimizer.state_dict(), os.path.join(work_path, 'best_optimizer.pdopt')) print(best_acc) paddle.save(model.state_dict(), os.path.join(work_path, 'final_model.pdparams')) paddle.save(optimizer.state_dict(), os.path.join(work_path, 'final_optimizer.pdopt'))
plot_learning_curve(loss_record1, title='loss', ylabel='CE Loss')
plot_learning_curve(acc_record1, title='acc', ylabel='Accuracy')
import time
work_path = 'work/model1'
model = AlexNet(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
aa = time.time()
for batch_id, data in enumerate(val_loader):
x_data, y_data = data
labels = paddle.unsqueeze(y_data, axis=1)
with paddle.no_grad():
logits = model(x_data)
bb = time.time()
print("Throughout:{}".format(int(len(val_dataset)//(bb - aa))))
Throughout:2114
work_path = 'work/model1'
X, y = next(iter(DataLoader(val_dataset, batch_size=18)))
model = AlexNet(num_classes=10)
model_state_dict = paddle.load(os.path.join(work_path, 'best_model.pdparams'))
model.set_state_dict(model_state_dict)
model.eval()
logits = model(X)
y_pred = paddle.argmax(logits, -1)
X = paddle.transpose(X, [0, 2, 3, 1])
axes = show_images(X.reshape((18, 128, 128, 3)), 1, 18, pred=get_cifar10_labels(y_pred), gt=get_cifar10_labels(y))
plt.show()
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
model | Train Acc | Val Acc | parameter |
---|---|---|---|
AlexNet w/o ECA | 0.7714 | 0.79104 | 7524042 |
AlexNet w ECA | 0.8533 | 0.84335 | 7524061 |
ECA实现起来非常简单(核大小为k的1D卷积)在增加少量参数(+19)的同时大大加快了收敛速度以及精度(+0.05231)
此文章为搬运
原项目链接
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。