赞
踩
前言:
-- 由于此项目全过程过于繁杂,我前后做了三四个月,无法把所有内容融入这一篇文章之中,所以本文以逻辑串联为主,记录了我从零开始,构建vgg 300网络 -> 构建 SSD模型 -> 数据标注 -> 训练以及训练模型保存 -> 恢复模型进行预测 的全过程
-- 本文提供了一些核心代码,需要完整的代码,或者有任何问题的,可以私信我或者等候后续有时间再上传到github
-- 想读懂本文需要的基础:神经网络基础、CNN基础、VGG基础、语义分割基础、实例分割基础、YOLO原理等
一、SSD原理介绍
1、怎么理解SSD(此方法属于一步法):
--two-stage方法,如R-CNN系算法,即是两步法,第一步选取候选框,第二步对这些候选框分类或者回归
--one-stage方法,如Yolo和SSD,即是一步法,其主要思路是均匀地在图片的不同位置进行密集抽样,抽样时可以采用不同尺度和长宽比,然后利用CNN提取特征后直接进行分类与回归,整个过程只需要一步,所以其优势是速度快
--one-stage的一个重要缺点是训练比较困难,这主要是因为正样本与负样本(背景)极其不均衡(参见Focal Loss),导致模型准确度稍低。
2、SSD与YOLO
-- SSD好过YOLO(此论点参考文章:https://zhuanlan.zhihu.com/p/68151917)
-- Yolo和SSD,其主要思路是均匀地在图片的不同位置进行密集抽样,抽样时可以采用不同尺度和长宽比,然后利用CNN提取特征后直接进行分类与回归,整个过程只需要一步,所以其优势是速度快
-- SSD和YOLO一样都是直接使用CNN进行分类和回归,但其比YOLO的优点在于:
-- 其使用用不同stride的卷积核生成了多尺度的feature map来进行检测,使用大尺度的feature map来检测小物体,使用小尺度的feature map来检测大物体
-- 提取了feature map之后,SSD直接使用conv进行检测,而YOLO是进过特征提取之后,生成了一个特征集合的张量,使用全连接层进行检测
-- 设置先验框为基准框,每个单元会设置多个先验框,其尺度和长宽比存在差异,在一定程度上减少训练难度
-- SSD原理与实现(参考文章:https://zhuanlan.zhihu.com/p/68151917)
3、SSD架构图
架构图如下:
关于网络后面部分逐渐减小的feature map
-- 网络后面大小不一的卷积层,其实都是作为分类器
-- 大分类器检测小目标,小分类器检测大目标
二、数据标注部分
1、数据标注工具
-- labelimg
-- 使用参考文档:https://blog.csdn.net/wsp_1138886114/article/details/85017498
2、目前主流的标注方式是以xml格式存储
其中重要的信息有:
-- filename:图片的文件名
-- name:标注的物体名称
-- xmin、ymin、xmax、ymax:物体位置的左上角、右下角坐标
3、自动划分数据的脚本
此脚本功能为:shuffle+训练集划分+测试集划分+验证集划分
脚本内容如下:
- import os
- import random
-
- trainval_percent = 0.66
- train_percent = 0.5
- xmlfilepath = 'face_data\Annotations'
- txtsavepath = 'face_data\ImageSets\Main'
- total_xml = os.listdir(xmlfilepath)
-
- num=len(total_xml)
- list=range(num)
- tv=int(num*trainval_percent)
- tr=int(tv*train_percent)
- trainval= random.sample(list,tv)
- train=random.sample(trainval,tr)
-
- ftrainval = open('face_data/ImageSets/Main/trainval.txt', 'w')
- ftest = open('face_data/ImageSets/Main/test.txt', 'w')
- ftrain = open('face_data/ImageSets/Main/train.txt', 'w')
- fval = open('face_data/ImageSets/Main/val.txt', 'w')
-
- for i in list:
- name=total_xml[i][:-4]+'\n'
- if i in trainval:
- ftrainval.write(name)
- if i in train:
- ftrain.write(name)
- else:
- fval.write(name)
- else:
- ftest.write(name)
-
- ftrainval.close()
- ftrain.close()
- fval.close()
- ftest .close()
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
三、使用vgg网络作为主体结构搭建SSD模型
1、VGG网络原理
其最核心逻辑是使用卷积核分解的逻辑:
-- 用多个小卷积核分解一个大卷积核,对这部分内容的理解请参考我之前关于卷积核分解的文章;
采用连续的几个3x3的卷积核代替网络中的较大卷积核(11x11,7x7,5x5)。
-- 对于给定的感受野(与输出有关的输入图片的局部大小),采用堆积的小卷积核是优于采用大的卷积核,因为多层非线性层可以增加网络深度来保证学习更复杂的模式,而且代价还比较小(参数更少)。
2、构建vgg_300网络主体架构
3、使用vgg_300网络构建SSD模型
图中的VGG每一层主要实现:
-- 1、一张原始图片被resize到(224,224,3)。
-- 2、conv1两次[3,3]卷积网络,输出的特征层为64,输出为(224,224,64),再2X2最大池化,输出net为(112,112,64)。
-- 3、conv2两次[3,3]卷积网络,输出的特征层为128,输出net为(112,112,128),再2X2最大池化,输出net为(56,56,128)。
-- 4、conv3三次[3,3]卷积网络,输出的特征层为256,输出net为(56,56,256),再2X2最大池化,输出net为(28,28,256)。
-- 5、conv3三次[3,3]卷积网络,输出的特征层为256,输出net为(28,28,512),再2X2最大池化,输出net为(14,14,512)。
-- 6、conv3三次[3,3]卷积网络,输出的特征层为256,输出net为(14,14,512),再2X2最大池化,输出net为(7,7,512)。
-- 7、利用卷积的方式模拟全连接层,效果等同,输出net为(1,1,4096)。共进行两次。
-- 8、利用卷积的方式模拟全连接层,效果等同,输出net为(1,1,1000)。
最后输出的就是每个类的预测。
主干网络代码:
- import math
- from collections import namedtuple
-
- import numpy as np
- import tensorflow as tf
-
- import tf_extended as tfe
- from nets import custom_layers
- from nets import ssd_common
-
- slim = tf.contrib.slim
-
-
- # =========================================================================== #
- # SSD class definition.
- # =========================================================================== #
- SSDParams = namedtuple('SSDParameters', ['img_shape',
- 'num_classes',
- 'no_annotation_label',
- 'feat_layers',
- 'feat_shapes',
- 'anchor_size_bounds',
- 'anchor_sizes',
- 'anchor_ratios',
- 'anchor_steps',
- 'anchor_offset',
- 'normalizations',
- 'prior_scaling'
- ])
-
-
- class SSDNet(object):
- """Implementation of the SSD VGG-based 300 network.
- The default features layers with 300x300 image input are:
- conv4 ==> 38 x 38
- conv7 ==> 19 x 19
- conv8 ==> 10 x 10
- conv9 ==> 5 x 5
- conv10 ==> 3 x 3
- conv11 ==> 1 x 1
- The default image size used to train this network is 300x300.
- """
- default_params = SSDParams(
- img_shape=(300, 300),
- num_classes=21,
- # num_classes=2,
- no_annotation_label=21,
- # no_annotation_label=2,
- feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'],
- feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],
- anchor_size_bounds=[0.15, 0.90],
- # anchor_size_bounds=[0.20, 0.90],
- anchor_sizes=[(21., 45.),
- (45., 99.),
- (99., 153.),
- (153., 207.),
- (207., 261.),
- (261., 315.)],
- # anchor_sizes=[(30., 60.),
- # (60., 111.),
- # (111., 162.),
- # (162., 213.),
- # (213., 264.),
- # (264., 315.)],
- anchor_ratios=[[2, .5],
- [2, .5, 3, 1./3],
- [2, .5, 3, 1./3],
- [2, .5, 3, 1./3],
- [2, .5],
- [2, .5]],
- anchor_steps=[8, 16, 32, 64, 100, 300],
- anchor_offset=0.5,
- normalizations=[20, -1, -1, -1, -1, -1],
- prior_scaling=[0.1, 0.1, 0.2, 0.2]
- )
-
- def __init__(self, params=None):
- """Init the SSD net with some parameters. Use the default ones
- if none provided.
- """
- if isinstance(params, SSDParams):
- self.params = params
- else:
- self.params = SSDNet.default_params
-
- # ======================================================================= #
- def net(self, inputs,
- is_training=True,
- update_feat_shapes=True,
- dropout_keep_prob=0.5,
- prediction_fn=slim.softmax,
- reuse=None,
- scope='ssd_300_vgg'):
- """SSD network definition.
- """
- r = ssd_net(inputs,
- num_classes=self.params.num_classes,
- feat_layers=self.params.feat_layers,
- anchor_sizes=self.params.anchor_sizes,
- anchor_ratios=self.params.anchor_ratios,
- normalizations=self.params.normalizations,
- is_training=is_training,
- dropout_keep_prob=dropout_keep_prob,
- prediction_fn=prediction_fn,
- reuse=reuse,
- scope=scope)
- # Update feature shapes (try at least!)
- if update_feat_shapes:
- shapes = ssd_feat_shapes_from_net(r[0], self.params.feat_shapes)
- self.params = self.params._replace(feat_shapes=shapes)
- return r
-
- def arg_scope(self, weight_decay=0.0005, data_format='NHWC'):
- """Network arg_scope.
- """
- return ssd_arg_scope(weight_decay, data_format=data_format)
-
- def arg_scope_caffe(self, caffe_scope):
- """Caffe arg_scope used for weights importing.
- """
- return ssd_arg_scope_caffe(caffe_scope)
-
- # ======================================================================= #
- def update_feature_shapes(self, predictions):
- """Update feature shapes from predictions collection (Tensor or Numpy
- array).
- """
- shapes = ssd_feat_shapes_from_net(predictions, self.params.feat_shapes)
- self.params = self.params._replace(feat_shapes=shapes)
-
- def anchors(self, img_shape, dtype=np.float32):
- """Compute the default anchor boxes, given an image shape.
- """
- return ssd_anchors_all_layers(img_shape,
- self.params.feat_shapes,
- self.params.anchor_sizes,
- self.params.anchor_ratios,
- self.params.anchor_steps,
- self.params.anchor_offset,
- dtype)
-
- def bboxes_encode(self, labels, bboxes, anchors,
- scope=None):
- """Encode labels and bounding boxes.
- """
- return ssd_common.tf_ssd_bboxes_encode(
- labels, bboxes, anchors,
- self.params.num_classes,
- self.params.no_annotation_label,
- ignore_threshold=0.5,
- prior_scaling=self.params.prior_scaling,
- scope=scope)
-
- def bboxes_decode(self, feat_localizations, anchors,
- scope='ssd_bboxes_decode'):
- """Encode labels and bounding boxes.
- """
- return ssd_common.tf_ssd_bboxes_decode(
- feat_localizations, anchors,
- prior_scaling=self.params.prior_scaling,
- scope=scope)
-
- def detected_bboxes(self, predictions, localisations,
- select_threshold=None, nms_threshold=0.5,
- clipping_bbox=None, top_k=400, keep_top_k=200):
- """Get the detected bounding boxes from the SSD network output.
- """
- # Select top_k bboxes from predictions, and clip
- rscores, rbboxes = \
- ssd_common.tf_ssd_bboxes_select(predictions, localisations,
- select_threshold=select_threshold,
- num_classes=self.params.num_classes)
- rscores, rbboxes = \
- tfe.bboxes_sort(rscores, rbboxes, top_k=top_k)
- # Apply NMS algorithm.
- rscores, rbboxes = \
- tfe.bboxes_nms_batch(rscores, rbboxes,
- nms_threshold=nms_threshold,
- keep_top_k=keep_top_k)
- if clipping_bbox is not None:
- rbboxes = tfe.bboxes_clip(clipping_bbox, rbboxes)
- return rscores, rbboxes
-
- def losses(self, logits, localisations,
- gclasses, glocalisations, gscores,
- match_threshold=0.5,
- negative_ratio=3.,
- alpha=1.,
- label_smoothing=0.,
- scope='ssd_losses'):
- """Define the SSD network losses.
- """
- return ssd_losses(logits, localisations,
- gclasses, glocalisations, gscores,
- match_threshold=match_threshold,
- negative_ratio=negative_ratio,
- alpha=alpha,
- label_smoothing=label_smoothing,
- scope=scope)
-
-
- # =========================================================================== #
- # SSD tools...
- # =========================================================================== #
- def ssd_size_bounds_to_values(size_bounds,
- n_feat_layers,
- img_shape=(300, 300)):
- """Compute the reference sizes of the anchor boxes from relative bounds.
- The absolute values are measured in pixels, based on the network
- default size (300 pixels).
- This function follows the computation performed in the original
- implementation of SSD in Caffe.
- Return:
- list of list containing the absolute sizes at each scale. For each scale,
- the ratios only apply to the first value.
- """
- assert img_shape[0] == img_shape[1]
-
- img_size = img_shape[0]
- min_ratio = int(size_bounds[0] * 100)
- max_ratio = int(size_bounds[1] * 100)
- step = int(math.floor((max_ratio - min_ratio) / (n_feat_layers - 2)))
- # Start with the following smallest sizes.
- sizes = [[img_size * size_bounds[0] / 2, img_size * size_bounds[0]]]
- for ratio in range(min_ratio, max_ratio + 1, step):
- sizes.append((img_size * ratio / 100.,
- img_size * (ratio + step) / 100.))
- return sizes
-
-
- def ssd_feat_shapes_from_net(predictions, default_shapes=None):
- """Try to obtain the feature shapes from the prediction layers. The latter
- can be either a Tensor or Numpy ndarray.
- Return:
- list of feature shapes. Default values if predictions shape not fully
- determined.
- """
- feat_shapes = []
- for l in predictions:
- # Get the shape, from either a np array or a tensor.
- if isinstance(l, np.ndarray):
- shape = l.shape
- else:
- shape = l.get_shape().as_list()
- shape = shape[1:4]
- # Problem: undetermined shape...
- if None in shape:
- return default_shapes
- else:
- feat_shapes.append(shape)
- return feat_shapes
-
-
- def ssd_anchor_one_layer(img_shape,
- feat_shape,
- sizes,
- ratios,
- step,
- offset=0.5,
- dtype=np.float32):
- """Computer SSD default anchor boxes for one feature layer.
- Determine the relative position grid of the centers, and the relative
- width and height.
- Arguments:
- feat_shape: Feature shape, used for computing relative position grids;
- size: Absolute reference sizes;
- ratios: Ratios to use on these features;
- img_shape: Image shape, used for computing height, width relatively to the
- former;
- offset: Grid offset.
- Return:
- y, x, h, w: Relative x and y grids, and height and width.
- """
- # Compute the position grid: simple way.
- # y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
- # y = (y.astype(dtype) + offset) / feat_shape[0]
- # x = (x.astype(dtype) + offset) / feat_shape[1]
- # Weird SSD-Caffe computation using steps values...
- y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
- y = (y.astype(dtype) + offset) * step / img_shape[0]
- x = (x.astype(dtype) + offset) * step / img_shape[1]
-
- # Expand dims to support easy broadcasting.
- y = np.expand_dims(y, axis=-1)
- x = np.expand_dims(x, axis=-1)
-
- # Compute relative height and width.
- # Tries to follow the original implementation of SSD for the order.
- num_anchors = len(sizes) + len(ratios)
- h = np.zeros((num_anchors, ), dtype=dtype)
- w = np.zeros((num_anchors, ), dtype=dtype)
- # Add first anchor boxes with ratio=1.
- h[0] = sizes[0] / img_shape[0]
- w[0] = sizes[0] / img_shape[1]
- di = 1
- if len(sizes) > 1:
- h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
- w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
- di += 1
- for i, r in enumerate(ratios):
- h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
- w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)
- return y, x, h, w
-
-
- def ssd_anchors_all_layers(img_shape,
- layers_shape,
- anchor_sizes,
- anchor_ratios,
- anchor_steps,
- offset=0.5,
- dtype=np.float32):
- """Compute anchor boxes for all feature layers.
- """
- layers_anchors = []
- for i, s in enumerate(layers_shape):
- anchor_bboxes = ssd_anchor_one_layer(img_shape, s,
- anchor_sizes[i],
- anchor_ratios[i],
- anchor_steps[i],
- offset=offset, dtype=dtype)
- layers_anchors.append(anchor_bboxes)
- return layers_anchors
-
-
- # =========================================================================== #
- # Functional definition of VGG-based SSD 300.
- # =========================================================================== #
- def tensor_shape(x, rank=3):
- """Returns the dimensions of a tensor.
- Args:
- image: A N-D Tensor of shape.
- Returns:
- A list of dimensions. Dimensions that are statically known are python
- integers,otherwise they are integer scalar tensors.
- """
- if x.get_shape().is_fully_defined():
- return x.get_shape().as_list()
- else:
- static_shape = x.get_shape().with_rank(rank).as_list()
- dynamic_shape = tf.unstack(tf.shape(x), rank)
- return [s if s is not None else d
- for s, d in zip(static_shape, dynamic_shape)]
-
-
- def ssd_multibox_layer(inputs,
- num_classes,
- sizes,
- ratios=[1],
- normalization=-1,
- bn_normalization=False):
- """Construct a multibox layer, return a class and localization predictions.
- """
- net = inputs
- if normalization > 0:
- net = custom_layers.l2_normalization(net, scaling=True)
- # Number of anchors.
- num_anchors = len(sizes) + len(ratios)
-
- # Location.
- num_loc_pred = num_anchors * 4
- loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None,
- scope='conv_loc')
- loc_pred = custom_layers.channel_to_last(loc_pred)
- loc_pred = tf.reshape(loc_pred,
- tensor_shape(loc_pred, 4)[:-1]+[num_anchors, 4])
- # Class prediction.
- num_cls_pred = num_anchors * num_classes
- cls_pred = slim.conv2d(net, num_cls_pred, [3, 3], activation_fn=None,
- scope='conv_cls')
- cls_pred = custom_layers.channel_to_last(cls_pred)
- cls_pred = tf.reshape(cls_pred,
- tensor_shape(cls_pred, 4)[:-1]+[num_anchors, num_classes])
- return cls_pred, loc_pred
-
-
- def ssd_net(inputs,
- num_classes=SSDNet.default_params.num_classes,
- feat_layers=SSDNet.default_params.feat_layers,
- anchor_sizes=SSDNet.default_params.anchor_sizes,
- anchor_ratios=SSDNet.default_params.anchor_ratios,
- normalizations=SSDNet.default_params.normalizations,
- is_training=True,
- dropout_keep_prob=0.5,
- prediction_fn=slim.softmax,
- reuse=None,
- scope='ssd_300_vgg'):
- """SSD net definition.
- """
- # if data_format == 'NCHW':
- # inputs = tf.transpose(inputs, perm=(0, 3, 1, 2))
-
- # End_points collect relevant activations for external use.
- end_points = {}
- with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
- # Original VGG-16 blocks.
- net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
- end_points['block1'] = net
- net = slim.max_pool2d(net, [2, 2], scope='pool1')
- # Block 2.
- net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
- end_points['block2'] = net
- net = slim.max_pool2d(net, [2, 2], scope='pool2')
- # Block 3.
- net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
- end_points['block3'] = net
- net = slim.max_pool2d(net, [2, 2], scope='pool3')
- # Block 4.
- net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
- end_points['block4'] = net
- net = slim.max_pool2d(net, [2, 2], scope='pool4')
- # Block 5.
- net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
- end_points['block5'] = net
- net = slim.max_pool2d(net, [3, 3], stride=1, scope='pool5')
-
- # Additional SSD blocks.
- # Block 6: let's dilate the hell out of it!
- net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')
- end_points['block6'] = net
- net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
- # Block 7: 1x1 conv. Because the fuck.
- net = slim.conv2d(net, 1024, [1, 1], scope='conv7')
- end_points['block7'] = net
- net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
-
- # Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts).
- end_point = 'block8'
- with tf.variable_scope(end_point):
- net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')
- net = custom_layers.pad2d(net, pad=(1, 1))
- net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID')
- end_points[end_point] = net
- end_point = 'block9'
- with tf.variable_scope(end_point):
- net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
- net = custom_layers.pad2d(net, pad=(1, 1))
- net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
- end_points[end_point] = net
- end_point = 'block10'
- with tf.variable_scope(end_point):
- net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
- net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
- end_points[end_point] = net
- end_point = 'block11'
- with tf.variable_scope(end_point):
- net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
- net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
- end_points[end_point] = net
-
- # Prediction and localisations layers.
- predictions = []
- logits = []
- localisations = []
- for i, layer in enumerate(feat_layers):
- with tf.variable_scope(layer + '_box'):
- p, l = ssd_multibox_layer(end_points[layer],
- num_classes,
- anchor_sizes[i],
- anchor_ratios[i],
- normalizations[i])
- predictions.append(prediction_fn(p))
- logits.append(p)
- localisations.append(l)
-
- return predictions, localisations, logits, end_points
- ssd_net.default_image_size = 300
-
-
- def ssd_arg_scope(weight_decay=0.0005, data_format='NHWC'):
- """Defines the VGG arg scope.
- Args:
- weight_decay: The l2 regularization coefficient.
- Returns:
- An arg_scope.
- """
- with slim.arg_scope([slim.conv2d, slim.fully_connected],
- activation_fn=tf.nn.relu,
- weights_regularizer=slim.l2_regularizer(weight_decay),
- weights_initializer=tf.contrib.layers.xavier_initializer(),
- biases_initializer=tf.zeros_initializer()):
- with slim.arg_scope([slim.conv2d, slim.max_pool2d],
- padding='SAME',
- data_format=data_format):
- with slim.arg_scope([custom_layers.pad2d,
- custom_layers.l2_normalization,
- custom_layers.channel_to_last],
- data_format=data_format) as sc:
- return sc
-
-
- # =========================================================================== #
- # Caffe scope: importing weights at initialization.
- # =========================================================================== #
- def ssd_arg_scope_caffe(caffe_scope):
- """Caffe scope definition.
- Args:
- caffe_scope: Caffe scope object with loaded weights.
- Returns:
- An arg_scope.
- """
- # Default network arg scope.
- with slim.arg_scope([slim.conv2d],
- activation_fn=tf.nn.relu,
- weights_initializer=caffe_scope.conv_weights_init(),
- biases_initializer=caffe_scope.conv_biases_init()):
- with slim.arg_scope([slim.fully_connected],
- activation_fn=tf.nn.relu):
- with slim.arg_scope([custom_layers.l2_normalization],
- scale_initializer=caffe_scope.l2_norm_scale_init()):
- with slim.arg_scope([slim.conv2d, slim.max_pool2d],
- padding='SAME') as sc:
- return sc
-
-
- # =========================================================================== #
- # SSD loss function.
- # =========================================================================== #
- def ssd_losses(logits, localisations,
- gclasses, glocalisations, gscores,
- match_threshold=0.5,
- negative_ratio=3.,
- alpha=1.,
- label_smoothing=0.,
- device='/cpu:0',
- scope=None):
- with tf.name_scope(scope, 'ssd_losses'):
- lshape = tfe.get_shape(logits[0], 5)
- num_classes = lshape[-1]
- batch_size = lshape[0]
-
- # Flatten out all vectors!
- flogits = []
- fgclasses = []
- fgscores = []
- flocalisations = []
- fglocalisations = []
- for i in range(len(logits)):
- flogits.append(tf.reshape(logits[i], [-1, num_classes]))
- fgclasses.append(tf.reshape(gclasses[i], [-1]))
- fgscores.append(tf.reshape(gscores[i], [-1]))
- flocalisations.append(tf.reshape(localisations[i], [-1, 4]))
- fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4]))
- # And concat the crap!
- logits = tf.concat(flogits, axis=0)
- gclasses = tf.concat(fgclasses, axis=0)
- gscores = tf.concat(fgscores, axis=0)
- localisations = tf.concat(flocalisations, axis=0)
- glocalisations = tf.concat(fglocalisations, axis=0)
- dtype = logits.dtype
-
- # Compute positive matching mask...
- pmask = gscores > match_threshold
- fpmask = tf.cast(pmask, dtype)
- n_positives = tf.reduce_sum(fpmask)
-
- # Hard negative mining...
- no_classes = tf.cast(pmask, tf.int32)
- predictions = slim.softmax(logits)
- nmask = tf.logical_and(tf.logical_not(pmask),
- gscores > -0.5)
- fnmask = tf.cast(nmask, dtype)
- nvalues = tf.where(nmask,
- predictions[:, 0],
- 1. - fnmask)
- nvalues_flat = tf.reshape(nvalues, [-1])
- # Number of negative entries to select.
- max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
- n_neg = tf.cast(negative_ratio * n_positives, tf.int32) + batch_size
- n_neg = tf.minimum(n_neg, max_neg_entries)
-
- val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
- max_hard_pred = -val[-1]
- # Final negative mask.
- nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
- fnmask = tf.cast(nmask, dtype)
-
- # Add cross-entropy loss.
- with tf.name_scope('cross_entropy_pos'):
- loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
- labels=gclasses)
- loss = tf.div(tf.reduce_sum(loss * fpmask), batch_size, name='value')
- tf.losses.add_loss(loss)
-
- with tf.name_scope('cross_entropy_neg'):
- loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,
- labels=no_classes)
- loss = tf.div(tf.reduce_sum(loss * fnmask), batch_size, name='value')
- tf.losses.add_loss(loss)
-
- # Add localization loss: smooth L1, L2, ...
- with tf.name_scope('localization'):
- # Weights Tensor: positive mask + random negative.
- weights = tf.expand_dims(alpha * fpmask, axis=-1)
- loss = custom_layers.abs_smooth(localisations - glocalisations)
- loss = tf.div(tf.reduce_sum(loss * weights), batch_size, name='value')
- tf.losses.add_loss(loss)
-
-
- def ssd_losses_old(logits, localisations,
- gclasses, glocalisations, gscores,
- match_threshold=0.5,
- negative_ratio=3.,
- alpha=1.,
- label_smoothing=0.,
- device='/cpu:0',
- scope=None):
- """Loss functions for training the SSD 300 VGG network.
- This function defines the different loss components of the SSD, and
- adds them to the TF loss collection.
- Arguments:
- logits: (list of) predictions logits Tensors;
- localisations: (list of) localisations Tensors;
- gclasses: (list of) groundtruth labels Tensors;
- glocalisations: (list of) groundtruth localisations Tensors;
- gscores: (list of) groundtruth score Tensors;
- """
- with tf.device(device):
- with tf.name_scope(scope, 'ssd_losses'):
- l_cross_pos = []
- l_cross_neg = []
- l_loc = []
- for i in range(len(logits)):
- dtype = logits[i].dtype
- with tf.name_scope('block_%i' % i):
- # Sizing weight...
- wsize = tfe.get_shape(logits[i], rank=5)
- wsize = wsize[1] * wsize[2] * wsize[3]
-
- # Positive mask.
- pmask = gscores[i] > match_threshold
- fpmask = tf.cast(pmask, dtype)
- n_positives = tf.reduce_sum(fpmask)
-
- # Select some random negative entries.
- # n_entries = np.prod(gclasses[i].get_shape().as_list())
- # r_positive = n_positives / n_entries
- # r_negative = negative_ratio * n_positives / (n_entries - n_positives)
-
- # Negative mask.
- no_classes = tf.cast(pmask, tf.int32)
- predictions = slim.softmax(logits[i])
- nmask = tf.logical_and(tf.logical_not(pmask),
- gscores[i] > -0.5)
- fnmask = tf.cast(nmask, dtype)
- nvalues = tf.where(nmask,
- predictions[:, :, :, :, 0],
- 1. - fnmask)
- nvalues_flat = tf.reshape(nvalues, [-1])
- # Number of negative entries to select.
- n_neg = tf.cast(negative_ratio * n_positives, tf.int32)
- n_neg = tf.maximum(n_neg, tf.size(nvalues_flat) // 8)
- n_neg = tf.maximum(n_neg, tf.shape(nvalues)[0] * 4)
- max_neg_entries = 1 + tf.cast(tf.reduce_sum(fnmask), tf.int32)
- n_neg = tf.minimum(n_neg, max_neg_entries)
-
- val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
- max_hard_pred = -val[-1]
- # Final negative mask.
- nmask = tf.logical_and(nmask, nvalues < max_hard_pred)
- fnmask = tf.cast(nmask, dtype)
-
- # Add cross-entropy loss.
- with tf.name_scope('cross_entropy_pos'):
- fpmask = wsize * fpmask
- loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
- labels=gclasses[i])
- loss = tf.losses.compute_weighted_loss(loss, fpmask)
- l_cross_pos.append(loss)
-
- with tf.name_scope('cross_entropy_neg'):
- fnmask = wsize * fnmask
- loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits[i],
- labels=no_classes)
- loss = tf.losses.compute_weighted_loss(loss, fnmask)
- l_cross_neg.append(loss)
-
- # Add localization loss: smooth L1, L2, ...
- with tf.name_scope('localization'):
- # Weights Tensor: positive mask + random negative.
- weights = tf.expand_dims(alpha * fpmask, axis=-1)
- loss = custom_layers.abs_smooth(localisations[i] - glocalisations[i])
- loss = tf.losses.compute_weighted_loss(loss, weights)
- l_loc.append(loss)
-
- # Additional total losses...
- with tf.name_scope('total'):
- total_cross_pos = tf.add_n(l_cross_pos, 'cross_entropy_pos')
- total_cross_neg = tf.add_n(l_cross_neg, 'cross_entropy_neg')
- total_cross = tf.add(total_cross_pos, total_cross_neg, 'cross_entropy')
- total_loc = tf.add_n(l_loc, 'localization')
-
- # Add to EXTRA LOSSES TF.collection
- tf.add_to_collection('EXTRA_LOSSES', total_cross_pos)
- tf.add_to_collection('EXTRA_LOSSES', total_cross_neg)
- tf.add_to_collection('EXTRA_LOSSES', total_cross)
- tf.add_to_collection('EXTRA_LOSSES', total_loc)
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
四、将标注完的数据类型转换为网络可接受的格式
转换代码如下:
- import tensorflow as tf
-
- from datasets import pascalvoc_to_tfrecords
-
- FLAGS = tf.app.flags.FLAGS
-
- tf.app.flags.DEFINE_string(
- 'dataset_name', 'pascalvoc',
- 'The name of the dataset to convert.')
- tf.app.flags.DEFINE_string(
- 'dataset_dir', './face_data/',
- 'Directory where the original dataset is stored.')
- tf.app.flags.DEFINE_string(
- 'output_name', 'voc_2007_train',
- 'Basename used for TFRecords output files.')
- tf.app.flags.DEFINE_string(
- 'output_dir', './face_tfrecord/',
- 'Output directory where to store TFRecords files.')
-
-
- def main(_):
- if not FLAGS.dataset_dir:
- raise ValueError('You must supply the dataset directory with --dataset_dir')
- print('Dataset directory:', FLAGS.dataset_dir)
- print('Output directory:', FLAGS.output_dir)
-
- if FLAGS.dataset_name == 'pascalvoc':
- pascalvoc_to_tfrecords.run(FLAGS.dataset_dir, FLAGS.output_dir, FLAGS.output_name)
- else:
- raise ValueError('Dataset [%s] was not recognized.' % FLAGS.dataset_name)
-
- if __name__ == '__main__':
- tf.app.run()
-
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
五、训练程序设计
训练代码如下:
- import tensorflow as tf
- from tensorflow.python.ops import control_flow_ops
-
- from datasets import dataset_factory
- from deployment import model_deploy
- from nets import nets_factory
- from preprocessing import preprocessing_factory
- import tf_utils
- tf.reset_default_graph()
- import os
- # os.environ['CUDA_VISIBLE_DEVICES']='0'
- slim = tf.contrib.slim
-
- DATA_FORMAT = 'NHWC'
- # DATA_FORMAT = 'NCHW'
-
- # =========================================================================== #
- # SSD Network flags.
- # =========================================================================== #
- tf.app.flags.DEFINE_float(
- 'loss_alpha', 1., 'Alpha parameter in the loss function.')
- tf.app.flags.DEFINE_float(
- 'negative_ratio', 3., 'Negative ratio in the loss function.')
- tf.app.flags.DEFINE_float(
- 'match_threshold', 0.5, 'Matching threshold in the loss function.')
-
- # =========================================================================== #
- # General Flags.
- # =========================================================================== #
- tf.app.flags.DEFINE_string(
- 'train_dir', './wuyang_model/',
- 'Directory where checkpoints and event logs are written to.')
- tf.app.flags.DEFINE_integer('num_clones', 1,
- 'Number of model clones to deploy.')
- tf.app.flags.DEFINE_boolean('clone_on_cpu', False,#True
- 'Use CPUs to deploy clones.')
- tf.app.flags.DEFINE_integer(
- 'num_readers', 4,
- 'The number of parallel readers that read data from the dataset.')
- tf.app.flags.DEFINE_integer(
- 'num_preprocessing_threads', 4,
- 'The number of threads used to create the batches.')
-
- tf.app.flags.DEFINE_integer(
- 'log_every_n_steps', 1,#10
- 'The frequency with which logs are print.')
- tf.app.flags.DEFINE_integer(
- 'save_summaries_secs', 600,
- 'The frequency with which summaries are saved, in seconds.')
- tf.app.flags.DEFINE_integer(
- 'save_interval_secs', 600,
- 'The frequency with which the model is saved, in seconds.')
- tf.app.flags.DEFINE_float(
- 'gpu_memory_fraction', 0.8, 'GPU memory fraction to use.')
-
- # =========================================================================== #
- # Optimization Flags.
- # =========================================================================== #
- tf.app.flags.DEFINE_float(
- 'weight_decay', 0.4, 'The weight decay on the model weights.')#0.00004
- tf.app.flags.DEFINE_string(
- 'optimizer', 'rmsprop',
- 'The name of the optimizer, one of "adadelta", "adagrad", "adam",'
- '"ftrl", "momentum", "sgd" or "rmsprop".')
- tf.app.flags.DEFINE_float(
- 'adadelta_rho', 0.95,
- 'The decay rate for adadelta.')
- tf.app.flags.DEFINE_float(
- 'adagrad_initial_accumulator_value', 0.01,
- 'Starting value for the AdaGrad accumulators.')
- tf.app.flags.DEFINE_float(
- 'adam_beta1', 0.9,
- 'The exponential decay rate for the 1st moment estimates.')
- tf.app.flags.DEFINE_float(
- 'adam_beta2', 0.999,
- 'The exponential decay rate for the 2nd moment estimates.')
- tf.app.flags.DEFINE_float('opt_epsilon', 1.0, 'Epsilon term for the optimizer.')
- tf.app.flags.DEFINE_float('ftrl_learning_rate_power', -0.5,
- 'The learning rate power.')
- tf.app.flags.DEFINE_float(
- 'ftrl_initial_accumulator_value', 0.1,
- 'Starting value for the FTRL accumulators.')
- tf.app.flags.DEFINE_float(
- 'ftrl_l1', 0.0, 'The FTRL l1 regularization strength.')
- tf.app.flags.DEFINE_float(
- 'ftrl_l2', 0.0, 'The FTRL l2 regularization strength.')
- tf.app.flags.DEFINE_float(
- 'momentum', 0.9,
- 'The momentum for the MomentumOptimizer and RMSPropOptimizer.')
- tf.app.flags.DEFINE_float('rmsprop_momentum', 0.9, 'Momentum.')
- tf.app.flags.DEFINE_float('rmsprop_decay', 0.9, 'Decay term for RMSProp.')
-
- # =========================================================================== #
- # Learning Rate Flags.
- # =========================================================================== #
- tf.app.flags.DEFINE_string(
- 'learning_rate_decay_type',
- 'exponential',
- 'Specifies how the learning rate is decayed. One of "fixed", "exponential",'
- ' or "polynomial"')
- tf.app.flags.DEFINE_float('learning_rate', 0.0001, 'Initial learning rate.')#0.01
- tf.app.flags.DEFINE_float(
- 'end_learning_rate', 0.01,# 0.0001
- 'The minimal end learning rate used by a polynomial decay learning rate.')
- tf.app.flags.DEFINE_float(
- 'label_smoothing', 0.0, 'The amount of label smoothing.')
- tf.app.flags.DEFINE_float(
- 'learning_rate_decay_factor', 0.94, 'Learning rate decay factor.')
- tf.app.flags.DEFINE_float(
- 'num_epochs_per_decay', 2.0,
- 'Number of epochs after which learning rate decays.')
- tf.app.flags.DEFINE_float(
- 'moving_average_decay', None,
- 'The decay to use for the moving average.'
- 'If left as None, then moving averages are not used.')
-
- # =========================================================================== #
- # Dataset Flags.
- # =========================================================================== #
- tf.app.flags.DEFINE_string(
- 'dataset_name', 'pascalvoc_2007', 'The name of the dataset to load.')
- tf.app.flags.DEFINE_integer(
- 'num_classes', 2, 'Number of classes to use in the dataset.')
- tf.app.flags.DEFINE_string(
- 'dataset_split_name', 'train', 'The name of the train/test split.')
- tf.app.flags.DEFINE_string(
- 'dataset_dir', './face_tfrecord/', 'The directory where the dataset files are stored.')
- tf.app.flags.DEFINE_integer(
- 'labels_offset', 0,
- 'An offset for the labels in the dataset. This flag is primarily used to '
- 'evaluate the VGG and ResNet architectures which do not use a background '
- 'class for the ImageNet dataset.')
- tf.app.flags.DEFINE_string(
- 'model_name', 'ssd_300_vgg', 'The name of the architecture to train.')
- tf.app.flags.DEFINE_string(
- 'preprocessing_name', None, 'The name of the preprocessing to use. If left '
- 'as `None`, then the model_name flag is used.')
- tf.app.flags.DEFINE_integer(
- 'batch_size', 4, 'The number of samples in each batch.')#32
- tf.app.flags.DEFINE_integer(
- 'train_image_size', None, 'Train image size')
- tf.app.flags.DEFINE_integer('max_number_of_steps', 20,#10
- 'The maximum number of training steps.')
-
- # =========================================================================== #
- # Fine-Tuning Flags.
- # =========================================================================== #
- tf.app.flags.DEFINE_string(
- 'checkpoint_path', './预训练模型/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt',
- 'The path to a checkpoint from which to fine-tune.')
- tf.app.flags.DEFINE_string(
- 'checkpoint_model_scope', None,
- 'Model scope in the checkpoint. None if the same as the trained model.')
- tf.app.flags.DEFINE_string(
- 'checkpoint_exclude_scopes', None,
- 'Comma-separated list of scopes of variables to exclude when restoring '
- 'from a checkpoint.')
- tf.app.flags.DEFINE_string(
- 'trainable_scopes', None,
- 'Comma-separated list of scopes to filter the set of variables to train.'
- 'By default, None would train all the variables.')
- tf.app.flags.DEFINE_boolean(
- 'ignore_missing_vars', False,
- 'When restoring a checkpoint would ignore missing variables.')
-
- FLAGS = tf.app.flags.FLAGS
-
-
- # =========================================================================== #
- # Main training routine.
- # =========================================================================== #
- def main(_):
- if not FLAGS.dataset_dir:
- raise ValueError('You must supply the dataset directory with --dataset_dir')
-
- tf.logging.set_verbosity(tf.logging.DEBUG)
- with tf.Graph().as_default():
- # Config model_deploy. Keep TF Slim Models structure.
- # Useful if want to need multiple GPUs and/or servers in the future.
- deploy_config = model_deploy.DeploymentConfig(
- num_clones=FLAGS.num_clones,
- clone_on_cpu=FLAGS.clone_on_cpu,
- replica_id=0,
- num_replicas=1,
- num_ps_tasks=0)
- # Create global_step.
- with tf.device(deploy_config.variables_device()):
- global_step = slim.create_global_step()
-
- # Select the dataset.
- dataset = dataset_factory.get_dataset(
- FLAGS.dataset_name, FLAGS.dataset_split_name, FLAGS.dataset_dir)
-
- # Get the SSD network and its anchors.
- ssd_class = nets_factory.get_network(FLAGS.model_name)
- ssd_params = ssd_class.default_params._replace(num_classes=FLAGS.num_classes)
- ssd_net = ssd_class(ssd_params)
- ssd_shape = ssd_net.params.img_shape
- ssd_anchors = ssd_net.anchors(ssd_shape)
-
- # Select the preprocessing function.
- preprocessing_name = FLAGS.preprocessing_name or FLAGS.model_name
- image_preprocessing_fn = preprocessing_factory.get_preprocessing(
- preprocessing_name, is_training=True)
-
- tf_utils.print_configuration(FLAGS.__flags, ssd_params,
- dataset.data_sources, FLAGS.train_dir)
- # =================================================================== #
- # Create a dataset provider and batches.
- # =================================================================== #
- with tf.device(deploy_config.inputs_device()):
- with tf.name_scope(FLAGS.dataset_name + '_data_provider'):
- provider = slim.dataset_data_provider.DatasetDataProvider(
- dataset,
- num_readers=FLAGS.num_readers,
- common_queue_capacity=20 * FLAGS.batch_size,
- common_queue_min=10 * FLAGS.batch_size,
- shuffle=True)
- # Get for SSD network: image, labels, bboxes.
- [image, shape, glabels, gbboxes] = provider.get(['image', 'shape',
- 'object/label',
- 'object/bbox'])
- # Pre-processing image, labels and bboxes.
- image, glabels, gbboxes = \
- image_preprocessing_fn(image, glabels, gbboxes,
- out_shape=ssd_shape,
- data_format=DATA_FORMAT)
- # Encode groundtruth labels and bboxes.
- gclasses, glocalisations, gscores = \
- ssd_net.bboxes_encode(glabels, gbboxes, ssd_anchors)
- batch_shape = [1] + [len(ssd_anchors)] * 3
-
- # Training batches and queue.
- r = tf.train.batch(
- tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
- batch_size=FLAGS.batch_size,
- num_threads=FLAGS.num_preprocessing_threads,
- capacity=5 * FLAGS.batch_size)
- b_image, b_gclasses, b_glocalisations, b_gscores = \
- tf_utils.reshape_list(r, batch_shape)
-
- # Intermediate queueing: unique batch computation pipeline for all
- # GPUs running the training.
- batch_queue = slim.prefetch_queue.prefetch_queue(
- tf_utils.reshape_list([b_image, b_gclasses, b_glocalisations, b_gscores]),
- capacity=2 * deploy_config.num_clones)
-
- # =================================================================== #
- # Define the model running on every GPU.
- # =================================================================== #
- def clone_fn(batch_queue):
- """Allows data parallelism by creating multiple
- clones of network_fn."""
- # Dequeue batch.
- b_image, b_gclasses, b_glocalisations, b_gscores = \
- tf_utils.reshape_list(batch_queue.dequeue(), batch_shape)
-
- # Construct SSD network.
- arg_scope = ssd_net.arg_scope(weight_decay=FLAGS.weight_decay,
- data_format=DATA_FORMAT)
- with slim.arg_scope(arg_scope):
- predictions, localisations, logits, end_points = \
- ssd_net.net(b_image, is_training=True)
- # Add loss function.
- ssd_net.losses(logits, localisations,
- b_gclasses, b_glocalisations, b_gscores,
- match_threshold=FLAGS.match_threshold,
- negative_ratio=FLAGS.negative_ratio,
- alpha=FLAGS.loss_alpha,
- label_smoothing=FLAGS.label_smoothing)
- return end_points
-
- # Gather initial summaries.
- summaries = set(tf.get_collection(tf.GraphKeys.SUMMARIES))
-
- # =================================================================== #
- # Add summaries from first clone.
- # =================================================================== #
- clones = model_deploy.create_clones(deploy_config, clone_fn, [batch_queue])
- first_clone_scope = deploy_config.clone_scope(0)
- # Gather update_ops from the first clone. These contain, for example,
- # the updates for the batch_norm variables created by network_fn.
- update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, first_clone_scope)
-
- # Add summaries for end_points.
- end_points = clones[0].outputs
- for end_point in end_points:
- x = end_points[end_point]
- summaries.add(tf.summary.histogram('activations/' + end_point, x))
- summaries.add(tf.summary.scalar('sparsity/' + end_point,
- tf.nn.zero_fraction(x)))
- # Add summaries for losses and extra losses.
- for loss in tf.get_collection(tf.GraphKeys.LOSSES, first_clone_scope):
- summaries.add(tf.summary.scalar(loss.op.name, loss))
- for loss in tf.get_collection('EXTRA_LOSSES', first_clone_scope):
- summaries.add(tf.summary.scalar(loss.op.name, loss))
-
- # Add summaries for variables.
- for variable in slim.get_model_variables():
- summaries.add(tf.summary.histogram(variable.op.name, variable))
-
- # =================================================================== #
- # Configure the moving averages.
- # =================================================================== #
- if FLAGS.moving_average_decay:
- moving_average_variables = slim.get_model_variables()
- variable_averages = tf.train.ExponentialMovingAverage(
- FLAGS.moving_average_decay, global_step)
- else:
- moving_average_variables, variable_averages = None, None
-
- # =================================================================== #
- # Configure the optimization procedure.
- # =================================================================== #
- with tf.device(deploy_config.optimizer_device()):
- learning_rate = tf_utils.configure_learning_rate(FLAGS,
- dataset.num_samples,
- global_step)
- optimizer = tf_utils.configure_optimizer(FLAGS, learning_rate)
- summaries.add(tf.summary.scalar('learning_rate', learning_rate))
-
- if FLAGS.moving_average_decay:
- # Update ops executed locally by trainer.
- update_ops.append(variable_averages.apply(moving_average_variables))
-
- # Variables to train.
- variables_to_train = tf_utils.get_variables_to_train(FLAGS)
-
- # and returns a train_tensor and summary_op
- total_loss, clones_gradients = model_deploy.optimize_clones(
- clones,
- optimizer,
- var_list=variables_to_train)
- # Add total_loss to summary.
- summaries.add(tf.summary.scalar('total_loss', total_loss))
-
- # Create gradient updates.
- grad_updates = optimizer.apply_gradients(clones_gradients,
- global_step=global_step)
- update_ops.append(grad_updates)
- update_op = tf.group(*update_ops)
- train_tensor = control_flow_ops.with_dependencies([update_op], total_loss,
- name='train_op')
-
- # Add the summaries from the first clone. These contain the summaries
- summaries |= set(tf.get_collection(tf.GraphKeys.SUMMARIES,
- first_clone_scope))
- # Merge all summaries together.
- summary_op = tf.summary.merge(list(summaries), name='summary_op')
-
- # =================================================================== #
- # Kicks off the training.
- # =================================================================== #
- gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.gpu_memory_fraction)
- config = tf.ConfigProto(log_device_placement=False,
- gpu_options=gpu_options)
- saver = tf.train.Saver(max_to_keep=5,
- keep_checkpoint_every_n_hours=1.0,
- write_version=2,
- pad_step_number=False)
- slim.learning.train(
- train_tensor,
- logdir=FLAGS.train_dir,
- master='',
- is_chief=True,
- # init_fn=tf_utils.get_init_fn(FLAGS),
- summary_op=summary_op,
- number_of_steps=FLAGS.max_number_of_steps,
- log_every_n_steps=FLAGS.log_every_n_steps,
- save_summaries_secs=FLAGS.save_summaries_secs,
- saver=saver
- ,save_interval_secs=FLAGS.save_interval_secs
- ,session_config=config
- ,sync_optimizer=None
- )
- if __name__ == '__main__':
- tf.app.run()
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
六、训练过程报错处理
1、训练进度停顿
-- 检查转换后的tfrecord数据,很大可能是数据格式转换失败
2、梯度爆炸
-- 检查损失函数逻辑,有可能是损失函数中符号错误
-- 样本维度和网络输入维度不一致等问题
3、梯度消失
-- 梯度消失的原因很多,需要根据具体问题具体定位
-- 比如可以尝试在网络的最开始添加批量归一化层
4、损失不收敛
-- 这个一般是学习率和优化器的配合使用不熟练导致的
-- 一般刚开始可以设置一个较大的学习率,然后缓慢减小
-- 优化器使用:Adam+随机梯度下降 ,使用Adam粗调,使用随机梯度下降进行精调
5、训练缓慢
-- 修改训练批次数量,一般批次数量越大,训练越缓慢
-- 也可以无脑添加drop out层
七、验证过程设计
验证代码:
- import os
- import math
- import random
-
- import numpy as np
- import tensorflow as tf
- import cv2
-
- slim = tf.contrib.slim
-
- # %matplotlib inline
- import matplotlib.pyplot as plt
- import matplotlib.image as mpimg
-
- import sys
- sys.path.append('../')
- from nets import ssd_vgg_300, ssd_common, np_methods
- from preprocessing import ssd_vgg_preprocessing
- from notebooks import visualization
- isess = sess = tf.Session()
-
- # Input placeholder.
- net_shape = (300, 300)
- data_format = 'NHWC'
- img_input = tf.placeholder(tf.uint8, shape=(None, None, 3))
- # Evaluation pre-processing: resize to SSD net shape.
- image_pre, labels_pre, bboxes_pre, bbox_img = ssd_vgg_preprocessing.preprocess_for_eval(
- img_input, None, None, net_shape, data_format, resize=ssd_vgg_preprocessing.Resize.WARP_RESIZE)
- image_4d = tf.expand_dims(image_pre, 0)
- print("part_1 sucessed")
-
- # Define the SSD model.
- reuse = True if 'ssd_net' in locals() else None
- ssd_net = ssd_vgg_300.SSDNet()
- with slim.arg_scope(ssd_net.arg_scope(data_format=data_format)):
- predictions, localisations, _, _ = ssd_net.net(image_4d, is_training=False, reuse=reuse)
-
- print("part_2 sucessed")
- # Restore SSD model.
- # ckpt_filename = 'F:/时间简史/人脸识别项目_wuyang/SSD源码/SSD-Tensorflow-master/预训练模型'
- # ckpt_filename = './wuyang_model/model.ckpt-20'
- ckpt_filename = './预训练模型/VGG_VOC0712_SSD_300x300_ft_iter_120000.ckpt'
- print("part_3 sucessed")
- isess.run(tf.global_variables_initializer())
- print("part_4 sucessed")
- saver = tf.train.Saver()
- saver.restore(isess, ckpt_filename)
- print("part_5 sucessed")
- # SSD default anchor boxes.
- ssd_anchors = ssd_net.anchors(net_shape)
- print("all part sucessed")
- # Main image processing routine.
- def process_image(img, select_threshold=0.5, nms_threshold=.45, net_shape=(300, 300)):
- # Run SSD network.
- rimg, rpredictions, rlocalisations, rbbox_img = isess.run([image_4d, predictions, localisations, bbox_img],
- feed_dict={img_input: img})
-
- # Get classes and bboxes from the net outputs.
- rclasses, rscores, rbboxes = np_methods.ssd_bboxes_select(
- rpredictions, rlocalisations, ssd_anchors,
- select_threshold=select_threshold, img_shape=net_shape, num_classes=21, decode=True)
-
- rbboxes = np_methods.bboxes_clip(rbbox_img, rbboxes)
- rclasses, rscores, rbboxes = np_methods.bboxes_sort(rclasses, rscores, rbboxes, top_k=400)
- rclasses, rscores, rbboxes = np_methods.bboxes_nms(rclasses, rscores, rbboxes, nms_threshold=nms_threshold)
- # Resize bboxes to original image shape. Note: useless for Resize.WARP!
- rbboxes = np_methods.bboxes_resize(rbbox_img, rbboxes)
- return rclasses, rscores, rbboxes
- # Test on some demo image and visualize output.
- path = 'F:/test-image/'
- image_names = sorted(os.listdir(path))
-
- img = mpimg.imread(path + image_names[0])
- rclasses, rscores, rbboxes = process_image(img)
-
- # visualization.bboxes_draw_on_img(img, rclasses, rscores, rbboxes, visualization.colors_plasma)
- visualization.plt_bboxes(img, rclasses, rscores, rbboxes)
![](https://csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreWhite.png)
训练完成之后:
-- 恢复保存的预训练模型
-- 使用预训练模型,对相关照片进行预测
测试过程可能报错:
-- 模型恢复报维度不匹配:
-- 根据实际维度,修改vgg网络输入维度
-- 读取checkpoint路径错误:
-- 要选取到check实际存在的文件夹
-- 然后还要定位到具体的ckpt文件
-- 例如我的路径为:./wuyang_model/model.ckpt-20000
八、最终结果
各数值对应类别如下:
-- 0:背景板
-- 1,2:飞机
-- 3:鸟
-- 4:船
-- 5:瓶子
-- 6:公交
-- 7:汽车
-- 8:猫
-- 9:椅子
-- 10:牛
-- 11:桌子
-- 12:熊猫
-- 13:马
-- 14:摩托
-- 15:人
-- 16:茶几
-- 17:羊
-- 18:沙发
-- 19:火车
-- 20:液晶屏幕
用训练完的模型,进行目标识别测试。
-- 由于没有来得及设计爬虫程序,所有样本都是手动抓取手动标注,所以有些类别学习不充分
-- 下面展示几个分类效果好的例子
对电脑的预测:
对熊猫的预测:
对飞机的预测:
以及对大美女、女神王祖贤的预测:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。