当前位置:   article > 正文

用Tensorflow重现YOLO V4_转化过后传上去调用检测的时候,只需要tensorflow的包,不需要yolo的包

转化过后传上去调用检测的时候,只需要tensorflow的包,不需要yolo的包

如果对Tensorflow实现最新的Yolo v7算法有兴趣的朋友,可以参见我最新发布的文章, Yolo v7的最简TensorFlow实现_gzroy的博客-CSDN博客

YOLO是一个非常出名的目标检测的模型,兼具精度和性能,在工业界的应用非常广泛。我司也是运用了YOLO V3算法在智能制造领域,用于协助机械臂进行精确定位。不过可惜YOLO的原作者自从推出了V3算法之后,因为个人的理念,不希望计算机视觉技术用在军事等领域上,宣布不再从事这方面的研究。所幸Alexey Bochkovskiy一直持续研究YOLO算法,并在2020年发布了论文,提出了V4版本,作出了很多改进,融合了在目标检测,图像识别方面的很多研究成果,实现了进一步的提升。具体可以参阅相关的论文。

这里我尝试基于Tensorflow 2.x版本来重现YOLO v4,希望可以加深对YOLO算法的理解。

Imagenet的预训练

网络结构

Darknet的csdarknet53-omega.cfg文件对应的是以CSPDarknet53为网络结构,对Imagenet的数据集进行图像分类。预训练后的模型可以作为目标检测的骨干网络。

首先我用Tensorflow来搭建这个CSPDarknet53的网络。可以用netron.app这个网站,打开cfg文件,即可显示这个网络结构的详细信息,按照这个架构来搭建网络,以下是这个网络结构的其中一部分的截图:

tensorflow的代码如下:

  1. import tensorflow as tf
  2. import tensorflow_addons as tfa
  3. from tensorflow.keras import Model
  4. l=tf.keras.layers
  5. def _conv(inputs, filters, kernel_size, strides, bias=True, normalize=True, activation='mish'):
  6. output = inputs
  7. padding_str = 'same'
  8. output = l.Conv2D(filters, kernel_size, strides, padding_str, \
  9. 'channels_first', use_bias=bias, \
  10. kernel_initializer='he_normal')(output)
  11. if normalize:
  12. output = l.BatchNormalization(axis=1)(output)
  13. if activation=='leaky':
  14. output = l.LeakyReLU(alpha=0.1)(output)
  15. elif activation=='mish':
  16. output = tfa.activations.mish(output)
  17. else:
  18. output = output
  19. return output
  20. def _csp_1(inputs, filters, block_num, activation='mish', name=None):
  21. output = _conv(inputs, filters*2, 3, 2)
  22. output_1 = _conv(output, filters*2, 1, 1)
  23. output = _conv(output,filters*2, 1, 1)
  24. for i in range(block_num):
  25. output_2 = _conv(output, filters, 1, 1)
  26. output_2 = _conv(output_2, filters*2, 3, 1)
  27. output_2 = l.Add()([output_2, output])
  28. output = output_2
  29. output_2 = _conv(output_2,filters*2, 1, 1)
  30. output = l.Concatenate(axis=1)([output_1, output_2])
  31. output = _conv(output, filters*2, 1, 1)
  32. return output
  33. def _csp_2(inputs, filters, block_num, training=True, activation='mish', name=None):
  34. output = _conv(inputs, filters*2, 3, 2)
  35. output_1 = _conv(output, filters, 1, 1)
  36. output = _conv(output,filters, 1, 1)
  37. for i in range(block_num):
  38. output_2 = _conv(output,filters, 1, 1)
  39. output_2 = _conv(output_2, filters, 3, 1)
  40. output_2 = l.Add()([output_2, output])
  41. #output_3 = _conv(output_2, filters, 1, 1)
  42. #output_3 = _conv(output_3, filters, 3, 1)
  43. #output_3 = l.Add()([output_2, output_3])
  44. output = output_2
  45. #output_3 = _conv(output_3,filters, 1, 1,)
  46. output_2 = _conv(output_2,filters, 1, 1,)
  47. output = l.Concatenate(axis=1)([output_1, output_2])
  48. output = _conv(output, filters*2, 1, 1)
  49. return output
  50. def CSPDarknet53_model():
  51. image = tf.keras.Input(shape=(3,None,None)) # 3*H*W
  52. net = _conv(image, 32, 3, 1) #32*H*W
  53. net = _csp_1(net, 32, 1) #64*H/2*W/2
  54. net = _csp_2(net, 64, 2) #128*H/4*W/4
  55. net = _csp_2(net, 128, 8) #256*H/8*W/8
  56. route1 = l.Activation('linear', dtype='float32', name='route1')(net) #256*H/8*W/8
  57. net = _csp_2(net, 256, 8) #512*H/16*W/16
  58. route2 = l.Activation('linear', dtype='float32', name='route2')(net) #512*H/16*W/16
  59. net = _csp_2(net, 512, 4) #1024*H/32*W/32
  60. route3 = l.Activation('linear', dtype='float32', name='route3')(net) #1024*H/32*W/32
  61. net = tf.reduce_mean(net, axis=[2,3], keepdims=True)
  62. net = _conv(net, 1000, 1, 1, True, False, 'linear')
  63. net = l.Flatten(data_format='channels_first', name='logits')(net)
  64. net = l.Activation('linear', dtype='float32', name='output')(net)
  65. model = tf.keras.Model(inputs=image, outputs=[net, route1, route2, route3])
  66. return model

在这个网络里面有三个层route1, route2, route3是用来输出不同维度的图形特征值,给以后搭建目标检测网络的时候来用。在Imagenet的分类训练中先用不到。

数据预处理

在cfg文件里面设置了采用cutmix和mosaic这两种方式,查看darknet的源代码,在data.c的load_data_augment函数里面定义了这两种方式的处理。简单来说,cutmix是组合两张图片,在其中一张图片里面随即定义一个矩形区域,填充第二张图片的内容。mosaic是组合4张图片,随即划分四个区域,分别填充这四张图片的内容。

用tensorflow来实现这个机制,我的做法是这样的,首先在dataset的map操作中对单张图片进行缩放,反转,改变图像饱和度等操作,然后通过要用到dataset.window,指定window的大小为4,表示每次取四张图片,然后通过flatmap来把四张图片组合成一个Tensor,然后再对这个Tensor来随机进行cutmix或者mosaic的操作。

Imagenet数据集的准备

需要准备Imagenet的数据,具体可以见我以前的另一篇博客,基于Tensorflow的Imagenet数据集的完整处理过程(包括物体标识框BBOX的处理)_valid_classes_gzroy的博客-CSDN博客

单张图片的变换

对单张图片的变换主要包括以下步骤,假设图片的原始尺寸为600*400:

  1. 随机缩放图像的宽(缩放比例为0.75-1/0.75,例如0.8),计算宽和高的短边。宽为600*0.8=480, 高为400,短边为400
  2. 随机确定一个正方形图像的边长(范围为128-448,例如300),并计算这个边长和第一步得到的短边的长度的比值,300/400=0.75
  3. 根据比值计算需要缩放的图像的宽和高,宽为400*0.75=300, 高为600*0.8*0.75=360,并缩放图像到这个尺寸
  4. 假设我们最终要获取的图像大小为256*256, 如果第3步的图像的短边比这个图像的边长256小,计算其比值,并缩放图像到这个大小。因为第3步我们得到的图像的短边是300,比256大,不需要作缩放。
  5. 随机剪切第4步的图像,获取一个256*256的区域。
  6. 随机翻转图像
  7. 随机旋转图像,旋转角度为-7到7度之间的一个随机值
  8. 随机调整图片的hue, 饱和度,明亮度,调整系数是在[0.6, 1.4]之间的一个随机值
  9. 给图片添加PCA噪音,其系数为高斯分布(0,0.1)的一个随机值
  10. 标准化图片的RGB的值,给RGB 3个Channel分别减去123.68,116.779,103.939,然后再除以58.393,57.12,57.375 

代码如下:

  1. imageWidth = 256
  2. imageHeight = 256
  3. min_crop=128
  4. max_crop=448
  5. random_min_aspect = 0.75
  6. random_max_aspect = 1/0.75
  7. random_angle = 7.
  8. eigvec = tf.constant([
  9. [-0.5675, 0.7192, 0.4009],
  10. [-0.5808, -0.0045, -0.8140],
  11. [-0.5836, -0.6948, 0.4203]],
  12. shape=[3,3], dtype=tf.float32
  13. )
  14. eigval = tf.constant([55.46, 4.794, 1.148], shape=[3,1], dtype=tf.float32)
  15. mean_RGB = tf.constant([123.68, 116.779, 109.939], dtype=tf.float32)
  16. std_RGB = tf.constant([58.393, 57.12, 57.375], dtype=tf.float32)
  17. # Parse TFRECORD and distort the image for train
  18. def _parse_function(example_proto):
  19. features = {
  20. "image": tf.io.FixedLenFeature([], tf.string, default_value=""),
  21. "height": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),
  22. "width": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),
  23. "channels": tf.io.FixedLenFeature([1], tf.int64, default_value=[3]),
  24. "colorspace": tf.io.FixedLenFeature([], tf.string, default_value=""),
  25. "img_format": tf.io.FixedLenFeature([], tf.string, default_value=""),
  26. "label": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),
  27. "bbox_xmin": tf.io.VarLenFeature(tf.float32),
  28. "bbox_xmax": tf.io.VarLenFeature(tf.float32),
  29. "bbox_ymin": tf.io.VarLenFeature(tf.float32),
  30. "bbox_ymax": tf.io.VarLenFeature(tf.float32),
  31. "text": tf.io.FixedLenFeature([], tf.string, default_value=""),
  32. "filename": tf.io.FixedLenFeature([], tf.string, default_value="")
  33. }
  34. parsed_features = tf.io.parse_single_example(example_proto, features)
  35. image_decoded = tf.image.decode_jpeg(parsed_features["image"], channels=3)
  36. image_decoded = tf.cast(image_decoded, dtype=tf.float32)
  37. # Random crop the image
  38. shape = tf.shape(image_decoded)
  39. height, width = shape[0], shape[1]
  40. random_aspect = tf.random.uniform(shape=[], minval=random_min_aspect, maxval=random_max_aspect)
  41. random_size = tf.random.uniform(shape=[], minval=min_crop, maxval=max_crop, dtype=tf.int32)
  42. min = tf.cond(
  43. height<tf.cast(tf.cast(width, tf.float32)*random_aspect, tf.int32),
  44. lambda:height,
  45. lambda:tf.cast(tf.cast(width, tf.float32)*random_aspect, tf.int32))
  46. scale = tf.cast(random_size/min, tf.float32)
  47. crop_height = tf.cast(tf.cast(height, tf.float32)*scale, tf.int32)
  48. crop_width = tf.cast(tf.cast(width, tf.float32)*random_aspect*scale, tf.int32)
  49. crop_resized = tf.image.resize(image_decoded, [crop_height, crop_width])
  50. min = tf.cond(crop_height<crop_width, lambda:crop_height, lambda:crop_width)
  51. ratio = tf.cond(min<random_size, lambda:tf.cast(random_size/min, tf.float32), lambda:1.)
  52. scale = tf.cond(random_size<imageHeight, lambda:tf.cast(imageHeight/random_size, tf.float32), lambda:1.)
  53. resized = tf.image.resize(
  54. crop_resized,
  55. [
  56. tf.cast(tf.cast(crop_height, tf.float32)*ratio*scale, tf.int32)+1,
  57. tf.cast(tf.cast(crop_width, tf.float32)*ratio*scale, tf.int32)+1
  58. ]
  59. )
  60. cropped = tf.image.random_crop(resized, [imageHeight, imageWidth, 3])
  61. # Flip to add a little more random distortion in.
  62. flipped = tf.image.random_flip_left_right(cropped)
  63. # Random rotate the image
  64. angle = tf.random.uniform(shape=[], minval=-random_angle, maxval=random_angle)*np.pi/180
  65. rotated = tfa.image.rotate(flipped, angle)
  66. # Random distort the image
  67. distorted = tf.image.random_hue(rotated, max_delta=0.3)
  68. distorted = tf.image.random_saturation(distorted, lower=0.6, upper=1.4)
  69. distorted = tf.image.random_brightness(distorted, max_delta=0.3)
  70. # Add PCA noice
  71. alpha = tf.random.normal([3], mean=0.0, stddev=0.1)
  72. pca_noice = tf.reshape(tf.matmul(tf.multiply(eigvec,alpha), eigval), [3])
  73. distorted = tf.add(distorted, pca_noice)
  74. # Normalize RGB
  75. distorted = tf.subtract(distorted, mean_RGB)
  76. distorted = tf.divide(distorted, std_RGB)
  77. image_train = tf.transpose(distorted, perm=[2, 0, 1])
  78. features = {'input_1': image_train}
  79. labels = tf.one_hot(parsed_features["label"][0], depth=1000)
  80. return features, labels

多张图片的变换

之后就是对多张图片进行cutmix或者mosaic的操作。首先定义一个_flatmap_function,用于把4张图片组合为一个batch。然后定义一个_mixup_function,随机进行以下操作

  1. 50%的机率返回其中的一张图片
  2. 25%的机率对其中的两张图片进行cutmix操作,组合为一张新的图片并返回
  3. 25%的机率对4张图片进行mosaic操作,组合为一张新的图片并返回

代码如下:

  1. def _flatmap_function(features):
  2. dataset_image = features['image'].padded_batch(4, [3, imageHeight, imageWidth], drop_remainder=True)
  3. dataset_label = features['label'].padded_batch(4, [1000], drop_remainder=True)
  4. dataset_combined = tf.data.Dataset.zip({'image':dataset_image, 'label':dataset_label})
  5. return dataset_combined
  6. def _mixup_function(features):
  7. images = features['image']
  8. labels = features['label']
  9. def _cutmix():
  10. min = 0.3
  11. max = 0.8
  12. cut_w = tf.random.uniform(shape=[], minval=int(min*imageWidth), maxval=int(max*imageWidth), dtype=tf.int32)
  13. cut_h = tf.random.uniform(shape=[], minval=int(min*imageHeight), maxval=int(max*imageHeight), dtype=tf.int32)
  14. cut_x = tf.random.uniform(shape=[], minval=0, maxval=(imageWidth-cut_w-1), dtype=tf.int32)
  15. cut_y = tf.random.uniform(shape=[], minval=0, maxval=(imageHeight-cut_h-1), dtype=tf.int32)
  16. left = cut_x
  17. right = cut_x+cut_w
  18. top = cut_y
  19. bottom = cut_y+cut_h
  20. alpha = tf.cast(cut_w*cut_h/(imageWidth*imageHeight), tf.float32)
  21. beta = tf.cast(1.-alpha, tf.float32)
  22. img0 = images[0]
  23. img1 = images[1]
  24. image = tf.concat([
  25. img0[:,:top,:],
  26. tf.concat([img0[:,top:bottom,:left], img1[:,top:bottom,left:right], img0[:,top:bottom,right:]], axis=-1),
  27. img0[:,bottom:,:]
  28. ], axis=-2)
  29. label0 = labels[0]
  30. label1 = labels[1]
  31. label = label0*beta+label1*alpha
  32. return image, label
  33. def _mosaic():
  34. area = imageWidth*imageHeight
  35. min_offset = 0.2
  36. cut_x = tf.random.uniform(shape=[], minval=int(min_offset*imageWidth), maxval=int((1-min_offset)*imageWidth), dtype=tf.int32)
  37. cut_y = tf.random.uniform(shape=[], minval=int(min_offset*imageHeight), maxval=int((1-min_offset)*imageHeight), dtype=tf.int32)
  38. ratio_0 = tf.cast(cut_x*cut_y/area, tf.float32)
  39. ratio_1 = tf.cast((imageWidth-cut_x)*cut_y/area, tf.float32)
  40. ratio_2 = tf.cast((imageHeight-cut_y)*cut_x/area, tf.float32)
  41. ratio_3 = tf.cast((imageHeight-cut_y)*(imageWidth-cut_x)/area, tf.float32)
  42. img0 = images[0]
  43. img1 = images[1]
  44. img2 = images[2]
  45. img3 = images[3]
  46. image = tf.concat([
  47. tf.concat([
  48. img0[:,(imageHeight-cut_y)//2:((imageHeight-cut_y)//2+cut_y),(imageWidth-cut_x)//2:((imageWidth-cut_x)//2+cut_x)],
  49. img1[:,(imageHeight-cut_y)//2:((imageHeight-cut_y)//2+cut_y),cut_x//2:(cut_x//2+imageWidth-cut_x)]
  50. ], axis=-1),
  51. tf.concat([
  52. img2[:,cut_y//2:(cut_y//2+imageHeight-cut_y),(imageWidth-cut_x)//2:((imageWidth-cut_x)//2+cut_x)],
  53. img3[:,cut_y//2:(cut_y//2+imageHeight-cut_y),cut_x//2:(cut_x//2+imageWidth-cut_x)]
  54. ], axis=-1)
  55. ], axis=-2)
  56. label = labels[0]*ratio_0+labels[1]*ratio_1+labels[2]*ratio_2+labels[3]*ratio_3
  57. return image, label
  58. def _mix_random():
  59. flag = tf.random.uniform(shape=[], minval=0., maxval=1.)
  60. image, label = tf.cond(tf.less(flag, 0.5), _cutmix, _mosaic)
  61. return image, label
  62. flag = tf.random.uniform(shape=[], minval=0., maxval=1.)
  63. image, label = tf.cond(
  64. tf.less(flag, 0.5),
  65. lambda:(images[0],labels[0]),
  66. _mix_random
  67. )
  68. return image, label
'
运行

构建训练集dataset

最后我们就可以构建一个dataset,完成整个对图片的预处理过程,生成训练集数据

  1. def train_input_fn():
  2. dataset_train = tf.data.TFRecordDataset(train_files)
  3. dataset_train = dataset_train.map(_parse_function, num_parallel_calls=tf.data.experimental.AUTOTUNE)
  4. dataset_train = dataset_train.window(4)
  5. dataset_train = dataset_train.flat_map(_flatmap_function)
  6. dataset_train = dataset_train.map(_mixup_function, num_parallel_calls=tf.data.experimental.AUTOTUNE)
  7. dataset_train = dataset_train.shuffle(buffer_size=1600, reshuffle_each_iteration=True)
  8. dataset_train = dataset_train.repeat(10)
  9. dataset_train = dataset_train.batch(batch_size)
  10. dataset_train = dataset_train.prefetch(batch_size)
  11. return dataset_train
'
运行

以下是生成的训练集的图片的示例,包括了cutmix和mosaic。

构建测试集dataset

测试集的构建就相对简单,只需要把单张图像缩放裁减即可。

  1. def _parse_test_function(example_proto):
  2. features = {
  3. "image": tf.io.FixedLenFeature([], tf.string, default_value=""),
  4. "height": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),
  5. "width": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),
  6. "channels": tf.io.FixedLenFeature([1], tf.int64, default_value=[3]),
  7. "colorspace": tf.io.FixedLenFeature([], tf.string, default_value=""),
  8. "img_format": tf.io.FixedLenFeature([], tf.string, default_value=""),
  9. "label": tf.io.FixedLenFeature([1], tf.int64, default_value=[0]),
  10. "bbox_xmin": tf.io.VarLenFeature(tf.float32),
  11. "bbox_xmax": tf.io.VarLenFeature(tf.float32),
  12. "bbox_ymin": tf.io.VarLenFeature(tf.float32),
  13. "bbox_ymax": tf.io.VarLenFeature(tf.float32),
  14. "text": tf.io.FixedLenFeature([], tf.string, default_value=""),
  15. "filename": tf.io.FixedLenFeature([], tf.string, default_value="")
  16. }
  17. parsed_features = tf.io.parse_single_example(example_proto, features)
  18. image_decoded = tf.image.decode_jpeg(parsed_features["image"], channels=3)
  19. image_decoded = tf.cast(image_decoded, dtype=tf.float32)
  20. shape = tf.shape(image_decoded)
  21. height, width = shape[0], shape[1]
  22. resized_height, resized_width = tf.cond(height<width,
  23. lambda: (tf.cast(tf.multiply(tf.cast(height, tf.float64),tf.divide(imageWidth,width)), tf.int32), imageWidth),
  24. lambda: (imageHeight, tf.cast(tf.multiply(tf.cast(width, tf.float64),tf.divide(imageHeight,height)), tf.int32))
  25. )
  26. padded_height = imageHeight - resized_height
  27. padded_width = imageWidth - resized_width
  28. image_resized = tf.image.resize(image_decoded, [resized_height, resized_width])
  29. image_padded = tf.image.pad_to_bounding_box(image_resized, padded_height//2, padded_width//2, imageHeight, imageWidth)
  30. # Normalize RGB
  31. image_valid = tf.subtract(image_padded, mean_RGB)
  32. image_valid = tf.divide(image_valid, std_RGB)
  33. image_valid = tf.transpose(image_valid, perm=[2, 0, 1])
  34. features = {'input_1': image_valid}
  35. labels = tf.one_hot(parsed_features["label"][0], depth=1000)
  36. return features, labels
  37. def val_input_fn():
  38. dataset_valid = tf.data.TFRecordDataset(valid_files)
  39. dataset_valid = dataset_valid.map(_parse_test_function, num_parallel_calls=tf.data.experimental.AUTOTUNE)
  40. dataset_valid = dataset_valid.take(100000)
  41. dataset_valid = dataset_valid.batch(batch_size)
  42. dataset_valid = dataset_valid.prefetch(batch_size)
  43. return dataset_valid
'
运行

训练模型

现在我们可以编写代码来对模型进行训练了,这里的学习率的变化也是参考darknet的实现,采用指数衰减的方式。Darknet里面每个Batch是128,初始学习率是0.1,我的电脑显卡是2080Ti,显存是11G,在混合精度下每个Batch最大值是64,因此初始学习率也减半调整为0.05。代码如下:

  1. initial_warmup_steps = 1000
  2. initial_lr = 0.05
  3. maximum_batches = 2400000
  4. power = 4
  5. START_EPOCH = 0
  6. NUM_EPOCH = 1
  7. STEPS_EPOCH = 20000
  8. STEPS_OFFSET = 0
  9. with tf.device('/GPU:0'):
  10. model = CSPDarknet53_model()
  11. optimizer=tf.keras.optimizers.SGD(learning_rate=0.0001, momentum=0.9)
  12. # If load model from previous, uncomment the below two line
  13. #tfa.register_all()
  14. #model = tf.keras.models.load_model('models/darknet53_custom_training_5000.h5')
  15. @tf.function
  16. def train_step(inputs, labels):
  17. with tf.GradientTape() as tape:
  18. predictions = model(inputs, training=True)
  19. pred_loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True, label_smoothing=0.1)(labels, predictions[0])
  20. total_loss = pred_loss
  21. gradients = tape.gradient(total_loss, model.trainable_variables)
  22. optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  23. return total_loss
  24. for epoch in range(NUM_EPOCH):
  25. start_step = tf.keras.backend.get_value(optimizer.iterations)+STEPS_OFFSET
  26. steps = start_step
  27. loss_sum = 0
  28. start_time = time.time()
  29. for inputs, labels in train_data:
  30. if (steps-start_step)>STEPS_EPOCH:
  31. break
  32. loss_sum += train_step(inputs, labels)
  33. steps = tf.keras.backend.get_value(optimizer.iterations)+STEPS_OFFSET
  34. if steps <= initial_warmup_steps:
  35. lr = initial_lr * math.pow(steps/initial_warmup_steps, power)
  36. tf.keras.backend.set_value(optimizer.lr, lr)
  37. else:
  38. lr = initial_lr * math.pow((1.-steps/maximum_batches), power)
  39. tf.keras.backend.set_value(optimizer.lr, lr)
  40. if steps%100 == 0:
  41. elasp_time = time.time()-start_time
  42. print("Step:{}, Loss:{:4.2f}, LR:{:5f}, Time:{:3.1f}s".format(steps, loss_sum/100, lr, elasp_time))
  43. loss_sum = 0
  44. start_time = time.time()
  45. steps += 1
  46. model.save('models/CSPDarknet53_original_'+str(START_EPOCH+epoch)+'.h5')
  47. m1 = tf.keras.metrics.CategoricalAccuracy()
  48. m2 = tf.keras.metrics.TopKCategoricalAccuracy()
  49. for inputs, labels in val_data:
  50. val_predict_logits = model(inputs, training=False)[0]
  51. val_predict = tf.keras.activations.softmax(val_predict_logits)
  52. m1.update_state(labels, val_predict)
  53. m2.update_state(labels, val_predict)
  54. print("Top-1 Accuracy:%f, Top-5 Accuracy:%f"%(m1.result().numpy(),m2.result().numpy()))
  55. m1.reset_states()
  56. m2.reset_states()

模型训练了大概30个EPOCH,TOP-5的准确率为90%,TOP-1的准确度为75%

YOLO的训练

在完成了Imagenet的预训练之后,我们就可以开始进行YOLO模型的搭建和训练了

YOLO模型的搭建

同样我们可以用netron.app这个网站来查看darknet里面关于yolo v4的网络结构,对应的文件是yolov4.cfg。下图是YOLO网络的一部分的截图

YOLO模型的输入是之前我们ImageNet预训练里面用到的CSPDarknet_model的三个输出,模型的输出是对应三个不同尺度的物体检测结果。假设我们的输入图形是512*512,那么对应的三个检测尺度分别是512/8=64, 512/16=32, 512/32=8。输出的向量的维度是[batch_size, 3*(1+4+80), 64*64+32*32+8*8]。这里面的3*(1+4+80)=255的维度对应的是每个检测尺度有3个不同尺寸的Anchor box,每个box的预测值是(1+4+80), 其中1表示是预测物体是否存在,4表示物体的中心点的坐标xy以及宽和高,80表示对应COCO 80个物体类别。代码如下:

  1. def YOLO_model():
  2. route1 = tf.keras.Input(shape=(256,None,None), name='input1') #256*H/8*W/8
  3. route2 = tf.keras.Input(shape=(512,None,None), name='input2') #512*H/16*W/16
  4. route3 = tf.keras.Input(shape=(1024,None,None), name='input3') #1024*H/32*W/32
  5. output1 = _conv(route1, 128, 1, 1, activation='leaky') #128*H/8*W/8
  6. output2 = _conv(route2, 256, 1, 1, activation='leaky') #256*H/16*W/16
  7. output3 = _conv(route3, 512, 1, 1, activation='leaky') #512*H/32*W/32
  8. output3 = _conv(output3, 1024, 3, 1, activation='leaky') #1024*H/32*W/32
  9. output3 = _conv(output3, 512, 1, 1, activation='leaky') #512*H/32*W/32
  10. spp1 = l.MaxPooling2D(pool_size=(5, 5), strides=(1, 1), padding='same', data_format='channels_first')(output3)
  11. spp2 = l.MaxPooling2D(pool_size=(9, 9), strides=(1, 1), padding='same', data_format='channels_first')(output3)
  12. spp3 = l.MaxPooling2D(pool_size=(13, 13), strides=(1, 1), padding='same', data_format='channels_first')(output3)
  13. output3 = l.Concatenate(axis=1)([spp1, spp2, spp3, output3]) #2048*H/32*W/32
  14. output3 = _conv(output3, 512, 1, 1, activation='leaky') #512*H/32*W/32
  15. output3 = _conv(output3, 1024, 3, 1, activation='leaky') #1024*H/32*W/32
  16. output3 = _conv(output3, 512, 1, 1, activation='leaky') #512*H/32*W/32
  17. output4 = _conv(output3, 256, 1, 1, activation='leaky') #256*H/32*W/32
  18. output4 = l.UpSampling2D((2,2),"channels_first",'nearest')(output4) #256*H/16*W/16
  19. output4 = l.Concatenate(axis=1)([output2, output4]) #512*H/16*W/16
  20. output4 = _conv(output4, 256, 1, 1, activation='leaky') #256*H/16*W/16
  21. output4 = _conv(output4, 512, 3, 1, activation='leaky') #512*H/16*W/16
  22. output4 = _conv(output4, 256, 1, 1, activation='leaky') #256*H/16*W/16
  23. output4 = _conv(output4, 512, 3, 1, activation='leaky') #512*H/16*W/16
  24. output4 = _conv(output4, 256, 1, 1, activation='leaky') #256*H/16*W/16
  25. output5 = _conv(output4, 128, 1, 1, activation='leaky') #128*H/16*W/16
  26. output5 = l.UpSampling2D((2,2),"channels_first",'nearest')(output5) #128*H/8*W/8
  27. output5 = l.Concatenate(axis=1)([output1, output5]) #256*H/8*W/8
  28. output5 = _conv(output5, 128, 1, 1, activation='leaky') #128*H/8*W/8
  29. output5 = _conv(output5, 256, 3, 1, activation='leaky') #256*H/8*W/8
  30. output5 = _conv(output5, 128, 1, 1, activation='leaky') #128*H/8*W/8
  31. output5 = _conv(output5, 256, 3, 1, activation='leaky') #256*H/8*W/8
  32. output5 = _conv(output5, 128, 1, 1, activation='leaky') #128*H/8*W/8
  33. yolo_small = _conv(output5, 256, 3, 1, activation='leaky') #256*H/8*W/8
  34. yolo_small = _conv(yolo_small, 255, 1, 1, normalize=False, activation='linear') #255*H/8*W/8
  35. yolo_small = l.Activation('linear', dtype='float32', name='yolo_small')(yolo_small) #256*H/8*W/8
  36. yolo_small = l.Reshape((255, -1))(yolo_small)
  37. output5 = _conv(output5, 256, 3, 2, activation='leaky') #256*H/16*W/16
  38. output6 = l.Concatenate(axis=1)([output4, output5]) #512*H/16*W/16
  39. output6 = _conv(output6, 256, 1, 1, activation='leaky') #256*H/16*W/16
  40. output6 = _conv(output6, 512, 3, 1, activation='leaky') #512*H/16*W/16
  41. output6 = _conv(output6, 256, 1, 1, activation='leaky') #256*H/16*W/16
  42. output6 = _conv(output6, 512, 3, 1, activation='leaky') #512*H/16*W/16
  43. output6 = _conv(output6, 256, 1, 1, activation='leaky') #256*H/16*W/16
  44. yolo_medium = _conv(output6, 512, 3, 1, activation='leaky') #512*H/16*W/16
  45. yolo_medium = _conv(yolo_medium, 255, 1, 1, normalize=False, activation='linear') #255*H/16*W/16
  46. yolo_medium = l.Activation('linear', dtype='float32', name='yolo_medium')(yolo_medium)
  47. yolo_medium = l.Reshape((255, -1))(yolo_medium)
  48. output6 = _conv(output6, 512, 3, 2, activation='leaky') #512*H/32*W/32
  49. output6 = l.Concatenate(axis=1)([output3, output6]) #1024*H/32*W/32
  50. output6 = _conv(output6, 512, 1, 1, activation='leaky') #512*H/32*W/32
  51. output6 = _conv(output6, 1024, 3, 1, activation='leaky') #1024*H/32*W/32
  52. output6 = _conv(output6, 512, 1, 1, activation='leaky') #512*H/32*W/32
  53. output6 = _conv(output6, 1024, 3, 1, activation='leaky') #1024*H/32*W/32
  54. output6 = _conv(output6, 512, 1, 1, activation='leaky') #512*H/32*W/32
  55. output6 = _conv(output6, 1024, 3, 1, activation='leaky') #1024*H/32*W/32
  56. yolo_big = _conv(output6, 255, 1, 1, normalize=False, activation='linear') #255*H/32*W/32
  57. yolo_big = l.Activation('linear', dtype='float32', name='yolo_big')(yolo_big)
  58. yolo_big = l.Reshape((255, -1))(yolo_big)
  59. yolo = l.Concatenate(axis=-1)([yolo_small, yolo_medium, yolo_big])
  60. yolo = tf.transpose(yolo, perm=[0, 2, 1])
  61. yolo = l.Activation('linear', dtype='float32')(yolo)
  62. model = tf.keras.Model(inputs=[route1, route2, route3], outputs=yolo, name='yolo')
  63. return model
'
运行

数据预处理

这里是用COCO的数据来进行训练和预测,COCO数据集的制作可见我另一个博客

YOLO v4里面对于数据处理方面的一个增强是mosaic,即拼接4张图片来进行训练,这个好处是增强了上下文语境,并且对于单张显卡训练更加友好(不需要太大的Batch数目)。

未完待续。。。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/818249
推荐阅读
相关标签
  

闽ICP备14008679号