赞
踩
使用VGG19网络模型实现图像分类:
VGGNet是牛津大学计算机视觉组和Google DeepMind公司一起研发的深度卷积神经网络,并取得了2014年Imagenet比赛定位项目第一名和分类项目第二名。该网络主要是泛化性能很好,容易迁移到其他的图像识别项目上,可以下载VGGNet训练好的参数进行很好的初始化权重操作,很多卷积神经网络都是以该网络为基础,比如FCN,UNet,SegNet等。vgg版本很多,常用的是VGG16,VGG19网络。
常见的卷积神经网络结构如图所示。卷积层后面会使用 ReLU 等激活函数,N 个卷
积层后通常会使用一个最大池化层(也有使用平均池化的);卷积和池化组合出现 M 次之后,提取出来的卷积特征会经过 K 个全连接层映射到若干个输出特征上,最后再经过一个全连接层或 Softmax 层来决定最终的输出。在第2.1节实验中,已经介绍了全连接层、ReLU激活函数、Softmax 层,本节介绍本实验中新增的基本单元:卷积层和最大池化层。
与全连接层类似,卷积层中的参数包括权重(即卷积核参数)和偏置。VGG19 中使用
的都是多输入输出特征图的卷积运算。假设输入特征图 X 的维度为 N ×Cin × Hin × Win,其中 N 是输入的样本个数(在本实验中 N = 1),Cin 是输入的通道数,Hin 和 Win 是输入特征图的高和宽。卷积核张量 W 用四维矩阵表示,维度为 Cin × K × K × Cout,其中 K × K 为卷积核的高度 × 宽度,Cout 为输出特征图的通道数。卷积层的偏置 b 用一维向量表示,维度为 Cout。同时定义输入特征图的边界扩充大小 p、卷积步长 s。输出特征图 Y 由输入 X 与卷积核 W 内积并加偏置 b 计算得到,Y 的维度为 N × Cout × Hout × Wout,其中 Hout 和Wout 是输出特征图的高和宽。
为了有效地减少计算量,CNN使用的另一个有效的工具被称为“池化(Pooling)”。池化就是将输入图像进行缩小,减少像素信息,只保留重要信息。
池化(pool)即下采样(downsamples),目的是为了减少特征图。
假设最大池化层的输入特征图 X 的维度为 N × C × Hin × Win,其中 N 是输入的样本个
数(在本实验中 N = 1),C 是输入的通道数,Hin 和 Win 是输入特征图的高和宽。池化窗口的高和宽均为 K,池化步长为 s,输出特征图 Y 的维度为 N ×C × Hout × Wout,其中 Hout和 Wout 是输出特征图的高和宽。
前向传播计算时,输出特征图 Y 中某一位置的值是输入特征图 X 的对应池化窗口内的
最大值,计算公式为
池化窗口 和 池化步长
池化窗口一般为2x2,步幅为2,相对于卷积层进行卷积运算,池化层进行的运算一般有以下几种:
举例
VGG19是经典的深度卷积神经网络结构,包含 5 个阶段共 16 个卷积层和 3 个全连接层,如表所示。前 2 个阶段各有 2 个卷积层,后 3 个阶段各有 4 个卷积层。每个卷积层均使用 3 × 3 大小的卷积核,边界扩充大小为 1,步长为 1,即保持输入输出特征图的高
和宽不变。每个阶段的卷积层的通道数在不断变化。在每个阶段的第一个卷积层,输入通
道数为上一个卷积层的输出通道数(第一个阶段的输入通道数为原始图像通道数)。5 个阶段的卷积层输出通道数分别为 64、128、256、512、512。每个阶段除第一个卷积层外,其他卷积层均保持输入和输出通道数相同。每个卷积层后面都跟随有 ReLU 层作为激活函数,每个阶段最后都跟随有一个最大池化层,将特征图的高和宽缩小为原来的 1/2。3 个全连接层中前 2 个全连接层后面也跟随有 ReLU 层。值得注意的是,第五阶段输出的特征图会进行变形,将四维特征图变形为二维矩阵作为全连接层的输入。网络最后是 Softmax 层计算分类概率。VGG19 的超参数配置详见表3.1,注意表中省略了卷积层和全连接层后的 ReLU层。
本实验采用 ImageNet 图像数据集,该数据集以.jpg 或.png 压缩文件格式存放每张 RGB 图像,且不同图像的尺寸可能不同。为了统一神经网络输入的大小,读入图像数据后,首先需要将图像缩放到 224×224大小,并存储在矩阵中。其次,需要对输入图像做标准化,将输入值范围从 [0,255] 标准化为均值为 0 的区间,从而提高神经网络的训练速度和稳定性。具体做法是图像的每个像素值减去 ImageNet 数据集的像素均值,该图像均值在加载 VGG19 模型参数的同时读入。本实验中使用 VGG19 模型中自带的图像均值进行输入图像标准化,是为了确保与官方使用VGG19 网络时的预处理方式保持一致。最后,将标准化后的图像矩阵转换为神经网络输入的统一维度,即 N × C × H × W,其中 N 是输入的样本数(由于图像是逐张读入的,因此N = 1),C 是输入的通道数(本实验输入图像是 RGB 彩色图像,因此 C = 3),H 和 W 分别表示输入的高和宽(缩放后的图像的高和宽均为 224)。
def load_image(self, image_dir):
print('Loading and preprocessing image from ' + image_dir)
self.input_image = scipy.misc.imread(image_dir)
self.input_image = scipy.misc.imresize(self.input_image,[224,224,3])
self.input_image = np.array(self.input_image).astype(np.float32)
self.input_image -= self.image_mean
self.input_image = np.reshape(self.input_image, [1]+list(self.input_image.shape))
# input dim [N, channel, height, width]
self.input_image = np.transpose(self.input_image, [0, 3, 1, 2])
class FlattenLayer(object):
def __init__(self, input_shape, output_shape): # 扁平化层的初始化
self.input_shape = input_shape
self.output_shape = output_shape
assert np.prod(self.input_shape) == np.prod(self.output_shape)
print('\tFlatten layer with input shape %s, output shape %s.' % (str(self.input_shape), str(self.output_shape)))
def forward(self, input): # 前向传播的计算
assert list(input.shape[1:]) == list(self.input_shape)
# matconvnet feature map dim: [N, height, width, channel]
# ours feature map dim: [N, channel, height, width]
self.input = np.transpose(input, [0, 2, 3, 1])
self.output = self.input.reshape([self.input.shape[0]] + list(self.output_shape))
show_matrix(self.output, 'flatten out ')
return self.output
这里不再过多赘述,直接上代码
# 卷积层
class ConvolutionalLayer(object):
def __init__(self, kernel_size, channel_in, channel_out, padding, stride):
# 卷积层的初始化
self.kernel_size = kernel_size
self.channel_in = channel_in
self.channel_out = channel_out
self.padding = padding
self.stride = stride
print('\tConvolutional layer with kernel size %d, input channel %d, output channel %d.' % (self.kernel_size, self.channel_in, self.channel_out))
def init_param(self, std=0.01): # 参数初始化
self.weight = np.random.normal(loc=0.0, scale=std, size=(self.channel_in, self.kernel_size, self.kernel_size, self.channel_out))
self.bias = np.zeros([self.channel_out])
def forward(self, input): # 前向传播的计算
start_time = time.time()
self.input = input # [N, C, H, W]
height = self.input.shape[2] + self.padding * 2
width = self.input.shape[3] + self.padding * 2
self.input_pad = np.zeros([self.input.shape[0], self.input.shape[1], height, width])
self.input_pad[:, :, self.padding:self.padding+self.input.shape[2], self.padding:self.padding+self.input.shape[3]] = self.input
height_out = (height - self.kernel_size) / self.stride + 1
width_out = (width - self.kernel_size) / self.stride + 1
self.output = np.zeros([self.input.shape[0], self.channel_out, height_out, width_out])
for idxn in range(self.input.shape[0]):
for idxc in range(self.channel_out):
for idxh in range(height_out):
for idxw in range(width_out):
# TODO: 计算卷积层的前向传播,特征图与卷积核的内积再加偏置
self.output[idxn, idxc, idxh, idxw] = _______________________
return self.output
def load_param(self, weight, bias): # 参数加载
assert self.weight.shape == weight.shape
assert self.bias.shape == bias.shape
self.weight = weight
self.bias = bias
# 最大池化层
class MaxPoolingLayer(object):
def __init__(self, kernel_size, stride): # 最大池化层的初始化
self.kernel_size = kernel_size
self.stride = stride
print('\tMax pooling layer with kernel size %d, stride %d.' % (self.kernel_size, self.stride))
def forward_raw(self, input): # 前向传播的计算
start_time = time.time()
self.input = input # [N, C, H, W]
self.max_index = np.zeros(self.input.shape)
height_out = (self.input.shape[2] - self.kernel_size) / self.stride + 1
width_out = (self.input.shape[3] - self.kernel_size) / self.stride + 1
self.output = np.zeros([self.input.shape[0], self.input.shape[1], height_out, width_out])
for idxn in range(self.input.shape[0]):
for idxc in range(self.input.shape[1]):
for idxh in range(height_out):
for idxw in range(width_out):
# TODO: 计算最大池化层的前向传播, 取池化窗口内的最大值
self.output[idxn, idxc, idxh, idxw] = _______________________
return self.output
class VGG19(object):
def __init__(self, param_path='../../imagenet-vgg-verydeep-19.mat'):
self.param_path = param_path
self.param_layer_name = (
'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1',
'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2',
'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3',
'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4',
'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4', 'pool5',
'flatten', 'fc6', 'relu6', 'fc7', 'relu7', 'fc8', 'softmax'
)
def build_model(self):
# TODO:定义VGG19 的网络结构
print('Building vgg-19 model...')
self.layers = {}
self.layers['conv1_1'] = ConvolutionalLayer(3, 3, 64, 1, 1)
self.layers['relu1_1'] = ReLULayer()
self.layers['conv1_2'] = ConvolutionalLayer(3, 64, 64, 1, 1)
self.layers['relu1_2'] = ReLULayer()
self.layers['pool1'] = MaxPoolingLayer(2, 2)
_______________________
self.layers['conv5_4'] = ConvolutionalLayer(3, 512, 512, 1, 1)
self.layers['relu5_4'] = ReLULayer()
self.layers['pool5'] = MaxPoolingLayer(2, 2)
self.layers['flatten'] = FlattenLayer([512, 7, 7], [512*7*7])
self.layers['fc6'] = FullyConnectedLayer(512*7*7, 4096)
self.layers['relu6'] = ReLULayer()
_______________________
self.layers['fc8'] = FullyConnectedLayer(4096, 1000)
self.layers['softmax'] = SoftmaxLossLayer()
self.update_layer_list = []
for layer_name in self.layers.keys():
if 'conv' in layer_name or 'fc' in layer_name:
self.update_layer_list.append(layer_name)
def init_model(self):
print('Initializing parameters of each layer in vgg-19...')
for layer_name in self.update_layer_list:
self.layers[layer_name].init_param()
def load_model(self):
print('Loading parameters from file ' + self.param_path)
params = scipy.io.loadmat(self.param_path)
self.image_mean = params['normalization'][0][0][0]
self.image_mean = np.mean(self.image_mean, axis=(0, 1))
print('Get image mean: ' + str(self.image_mean))
for idx in range(43):
if 'conv' in self.param_layer_name[idx]:
weight, bias = params['layers'][0][idx][0][0][0][0]
# matconvnet: weights dim [height, width, in_channel, out_channel]
# ours: weights dim [in_channel, height, width, out_channel]
weight = np.transpose(weight,[2,0,1,3])
bias = bias.reshape(-1)
self.layers[self.param_layer_name[idx]].load_param(weight, bias)
if idx >= 37 and 'fc' in self.param_layer_name[idx]:
weight, bias = params['layers'][0][idx-1][0][0][0][0]
weight = weight.reshape([weight.shape[0]*weight.shape[1]*weight.shape[2], weight.shape[3]])
self.layers[self.param_layer_name[idx]].load_param(weight, bias)
def load_image(self, image_dir):
print('Loading and preprocessing image from ' + image_dir)
self.input_image = scipy.misc.imread(image_dir)
self.input_image = scipy.misc.imresize(self.input_image,[224,224,3])
self.input_image = np.array(self.input_image).astype(np.float32)
self.input_image -= self.image_mean
self.input_image = np.reshape(self.input_image, [1]+list(self.input_image.shape))
# input dim [N, channel, height, width]
self.input_image = np.transpose(self.input_image, [0, 3, 1, 2])
def forward(self):
print('Inferencing...')
start_time = time.time()
current = self.input_image
for idx in range(len(self.param_layer_name)):
print('Inferencing layer: ' + self.param_layer_name[idx])
current = self.layers[self.param_layer_name[idx]].forward(current)
print('Inference time: %f' % (time.time()-start_time))
return current
def evaluate(self):
prob = self.forward()
top1 = np.argmax(prob[0])
print('Classification result: id = %d, prob = %f' % (top1, prob[0, top1]))
if __name__ == '__main__':
vgg = VGG19()
vgg.build_model()
vgg.init_model()
vgg.load_model()
vgg.load_image('../../cat1.jpg')
prob = vgg.evaluate()
使用VGG19卷积神经网络模型查看下列图像分类:
def test_conv_and_pool_layer():
test_data = np.random.rand(1, 3, 22, 22)
test_filter = np.random.rand(3, 3, 3, 64)
test_bias = np.random.rand(64)
conv = ConvolutionalLayer(3, 3, 64, 1, 1)
conv.init_param()
conv.load_param(test_filter, test_bias)
conv_result = conv.forward(test_data)
std_conv = standardConv(3, 3, 64, 1, 1)
std_conv.init_param()
std_conv.load_param(test_filter, test_bias)
std_conv_result = std_conv.forward(test_data)
conv_mse = computeMse(conv_result.flatten(), std_conv_result.flatten())
print('test conv err rate: %f%%'%(conv_mse*100))
pool = MaxPoolingLayer(2, 2)
pool_result = pool.forward(test_data)
std_pool = standardPool(2, 2)
std_pool_result = std_pool.forward(test_data)
pool_mse = computeMse(pool_result.flatten(), std_pool_result.flatten())
print('test pool err rate: %f%%'%(pool_mse*100))
if conv_mse < 0.003 and pool_mse < 0.003:
print('TEST CONV AND POOL LAYER PASS.')
else:
print('TEST CONV AND POOL LAYER FAILED.')
exit()
def forward(vgg):
print('Inferencing...')
start_time = time.time()
current = vgg.input_image
pool5 = np.array([])
for idx in range(len(vgg.param_layer_name)):
print('Inferencing layer: ' + vgg.param_layer_name[idx])
current = vgg.layers[vgg.param_layer_name[idx]].forward(current)
if 'pool5' in vgg.param_layer_name[idx]:
pool5 = current
print('Inference time: %f' % (time.time()-start_time))
return current, pool5
def check_pool5(stu_pool5):
data = np.load('pool5_dump.npy')
pool5_mse = computeMse(stu_pool5.flatten(), data.flatten())
print('test pool5 mse: %f'%pool5_mse)
if pool5_mse < 0.003:
print('CHECK POOL5 PASS.')
else:
print('CHECK POOL5 FAILED.')
exit()
def evaluate(vgg):
prob, pool5 = forward(vgg)
top1 = np.argmax(prob[0])
print('Classification result: id = %d, prob = %f' % (top1, prob[0, top1]))
return pool5
if __name__ == '__main__':
test_conv_and_pool_layer()
print('-------------------------------')
vgg = VGG19(param_path='../imagenet-vgg-verydeep-19.mat')
vgg.build_model()
vgg.init_model()
vgg.load_model()
vgg.load_image('../cat1.jpg')
pool5 = evaluate(vgg)
print('-------------------------------')
check_pool5(pool5)
实验结果
图像分类编号为281,预测准确率50.7269%
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。