当前位置:   article > 正文

Yolo v7的最简TensorFlow实现_yolo tensorflow

yolo tensorflow

Yolo v7去年推出之后,取得了很好的性能。作者也公布了基于Pytorch实现的源代码。在我之前的几篇博客当中,对代码进行了深入的解析,了解了Yolo v7的技术细节和实现机制。因为我一直是用的Tensorflow,因此也想尝试把代码移植到Tensorflow上。

数据集的构建

直接运行Yolo v7源代码里面的get_coco.sh脚本下载coco数据集,脚本代码如下:

  1. #!/bin/bash
  2. # COCO 2017 dataset http://cocodataset.org
  3. # Download command: bash ./scripts/get_coco.sh
  4. # Download/unzip labels
  5. d='./' # unzip directory
  6. url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
  7. f='coco2017labels-segments.zip' # or 'coco2017labels.zip', 68 MB
  8. echo 'Downloading' $url$f ' ...'
  9. curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
  10. # Download/unzip images
  11. d='./coco/images' # unzip directory
  12. url=http://images.cocodataset.org/zips/
  13. f1='train2017.zip' # 19G, 118k images
  14. f2='val2017.zip' # 1G, 5k images
  15. f3='test2017.zip' # 7G, 41k images (optional)
  16. for f in $f1 $f2 $f3; do
  17. echo 'Downloading' $url$f '...'
  18. curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
  19. done
  20. wait # finish background tasks

数据下载完成之后,在images和labels目录下分别有train2017, val2017, test2017这三个子目录,对应训练/验证/测试数据。

然后我们可以基于Tensorflow来构建一个训练的数据集,需要对训练的图像进行增强,包括了包括了Mosaic拼接,随机拷贝图像,随机形变,色彩调整等,相应的图像里面的物体Label也要做相应的变换。具体的工作原理可以见我之前的博客,解读YOLO v7的代码(二)训练数据的准备-CSDN博客

这里我定义了一个Dataloader的类,负责对训练集的数据进行相应的图像增强处理,这里的处理过程和Yolov7源码的基本是一致的,只是做了一些小的修改,就是当做了Mosaic拼接之后,如果随机形变是进行缩小,那么有可能会出现物体的检测框超出图像的情况,这里我根据物体的segments数据进行了裁减,使得不会超出图像。

对于验证集的数据,我们不需要进行图像增强,只需要对图像的长边缩放到640即可,空白部分进行padding。Tensorflow的dataset的定义如下:

  1. def map_val_fn(t: tf.Tensor):
  2. filename = str(t.numpy(), encoding='utf-8')
  3. imgid = int(filename[20:32])
  4. # Load image
  5. img, (h0, w0), (h, w) = load_image(filename)
  6. #augment_hsv(img, hgain=hsv_h, sgain=hsv_s, vgain=hsv_v)
  7. # Labels
  8. label_filename = val_label_path + filename.split('/')[-1].split('.')[0] + '.txt'
  9. labels, _ = load_labels(label_filename)
  10. labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, 0, 0) # normalized xywh to pixel xyxy format
  11. labels[:, 1:5] = xyxy2xywh(labels[:, 1:5]) # convert xyxy to xywh
  12. labels[:, 1:5] /= img_size # normalized height 0-1
  13. img = img[:, :, ::-1].transpose(2,0,1)
  14. img = img/255.
  15. img_hw = tf.concat([h0, w0], axis=0)
  16. return img, labels, img_hw, imgid
  17. dataset_val = tf.data.Dataset.list_files("coco/images/val2017/*.jpg", shuffle=False)
  18. dataset_val = dataset_val.map(
  19. lambda x: tf.py_function(func=map_val_fn, inp=[x], Tout=[tf.float32, tf.float32, tf.int32, tf.int32]),
  20. num_parallel_calls=tf.data.experimental.AUTOTUNE)
  21. dataset_val = dataset_val\
  22. .padded_batch(val_batch_size, padded_shapes=([3, img_size, img_size], [None, 5], [2], []), padding_values=(144/255., 0., 0, 0))\
  23. .prefetch(tf.data.experimental.AUTOTUNE)

对于训练集的dataset,本来我也是打算按类似以上验证集的方式来定义,只是把map函数替换为对应的Dataloader里面的函数,具体代码可以见dataloader.py。但是我发现这种方式效率不高,在实际测试中发现,因为这个图像增强的过程比较复杂,CPU需要花费较多的事件处理,虽然Tensorflow dataset的map和prefetch提供了一个Autotune的参数可以进行并行处理的优化,但是效果不是太理想,还是出现GPU等待CPU处理完数据的情况。为此我自己写了一个并行处理的函数,利用Python multiprocessing的多进程函数,来对图像进行并行处理,当GPU在训练100个Batch的时候,CPU并行准备下100个Batch的训练数据,这样可以大幅提高性能。

具体做法是创建一个share memory给各个子进程共享,然后在训练集的图像中随机抽取一部分文件名,分配给几个子进程,每个子进程读取这些图像,进行相应的图像处理,以及对相应的图像Label文件进行处理,并把处理后的数据写入到Share memory的对应位置。最后有一个独立的子进程对Share memory的数据进行合并整理,然后就可以基于整理后的数据直接构建一个dataset了。

相关的代码如下:

  1. #对传入的图像ID进行增强处理,并把结果写入到共享内存
  2. def augment_data(imgids, datasize, memory_name, offset, q):
  3. dataset = Dataloader(img_size, train_image_dir, train_label_dir, imgids, hyp)
  4. traindata = dataset.generateTrainData(datasize)
  5. traindata_obj = pickle.dumps(traindata, protocol=pickle.HIGHEST_PROTOCOL)
  6. existing_shm = shared_memory.SharedMemory(name=memory_name)
  7. existing_shm.buf[offset:offset+len(traindata_obj)] = traindata_obj
  8. q.put((offset, offset+len(traindata_obj)))
  9. existing_shm.close()
  10. #对图像处理子进程的结果进行合并
  11. def merge_subprocess(q, subprocess_num, memory_name):
  12. results = []
  13. while(True):
  14. msg = q.get()
  15. if msg is not None:
  16. results.append(msg)
  17. if len(results)>=subprocess_num:
  18. break
  19. else:
  20. time.sleep(1)
  21. existing_shm = shared_memory.SharedMemory(name=memory_name)
  22. merge_data = []
  23. for result in results:
  24. merge_data.extend(pickle.loads(existing_shm.buf[result[0]:result[1]]))
  25. merge_data_obj = pickle.dumps(merge_data, protocol=pickle.HIGHEST_PROTOCOL)
  26. existing_shm.buf[:len(merge_data_obj)] = merge_data_obj
  27. existing_shm.close()
  28. q.put(len(merge_data_obj))
  29. #启动多个子进程进行图像增强处理,并对结果进行汇总整理
  30. def prepare_traindata(memory_name):
  31. sample_imgid = sample(imgid_train, sample_len) #随机选取一部分训练集图像的文件名
  32. subprocess_list = []
  33. for i in range(subprocess_num): #启动多个子进程,分别对图像和Label进行处理
  34. subprocess_list.append(
  35. mp.Process(
  36. target=augment_data,
  37. args=(sample_imgid[i*imgid_num_process:(i+1)*imgid_num_process], data_size//subprocess_num, memory_name, i*shared_memory_size//subprocess_num, q, )
  38. )
  39. )
  40. for p in subprocess_list:
  41. p.start()
  42. #启动子进程对处理结果进行汇总整理
  43. p0 = mp.Process(target=merge_subprocess, args=(q, subprocess_num, memory_name,))
  44. p0.start()
  45. return p0
  46. image_cache = shared_memory.SharedMemory(name="dataset", create=True, size=shared_memory_size) #创建共享内存
  47. merge_proc = prepare_traindata("dataset")
  48. #等待汇总子进程执行完毕,从Queue中获取数据size,并进行反序列化
  49. merge_proc.join()
  50. msg = q.get()
  51. if msg>0:
  52. traindata = pickle.loads(image_cache.buf[:msg])
  53. else:
  54. print("Could not load training data.")
  55. image_cache.close()
  56. image_cache.unlink()
  57. image_cache.close()
  58. image_cache.unlink()
  59. def traindata_gen():
  60. global traindata
  61. i = 0
  62. while i<len(traindata):
  63. yield traindata[i][0]/255., traindata[i][1]
  64. i += 1
  65. #构建dataset
  66. dataset = tf.data.Dataset.from_generator(
  67. traindata_gen,
  68. output_types=(tf.float32, tf.float32),
  69. output_shapes=((3, img_size, img_size), (None, 5)))
  70. dataset = dataset.padded_batch(batch_size, padded_shapes=([3, img_size, img_size], [None, 5]))
  71. dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)

模型的定义

构建一个YOLO v7的模型,模型的结构解读可见我之前的另一篇博客解读YOLO v7的代码(一)模型结构研究_gzroy的博客-CSDN博客

定义一个yolo.py文件,里面定义了模型的自定义层和对模型进行组装。

  1. import tensorflow as tf
  2. from tensorflow import keras
  3. l=tf.keras.layers
  4. from params import *
  5. @tf.keras.utils.register_keras_serializable()
  6. class YoloConv(keras.layers.Layer):
  7. def __init__(self, filters, kernel_size, strides, padding='same', bias=False, activation='swish', **kwargs):
  8. super(YoloConv, self).__init__(**kwargs)
  9. self.activation = activation
  10. self.filters = filters
  11. self.kernel_size = kernel_size
  12. self.strides = strides
  13. self.padding = padding
  14. self.bias = bias
  15. self.cv = l.Conv2D(filters=self.filters,
  16. kernel_size=self.kernel_size,
  17. strides=self.strides,
  18. padding=self.padding,
  19. data_format='channels_first',
  20. use_bias=self.bias,
  21. kernel_initializer='he_normal',
  22. kernel_regularizer=tf.keras.regularizers.l2(l=weight_decay))
  23. self.bn = l.BatchNormalization(axis=1)
  24. self.swish = l.Activation('swish')
  25. def call(self, inputs, training):
  26. output = self.cv(inputs)
  27. output = self.bn(output, training)
  28. if self.activation=='swish':
  29. output = self.swish(output)
  30. else:
  31. output = output
  32. return output
  33. def get_config(self):
  34. config = super(YoloConv, self).get_config()
  35. config.update({
  36. "activation": self.activation,
  37. "filters": self.filters,
  38. "kernel_size": self.kernel_size,
  39. "strides": self.strides,
  40. "padding": self.padding,
  41. "bias": self.bias
  42. })
  43. return config
  44. @tf.keras.utils.register_keras_serializable()
  45. class Elan(keras.layers.Layer):
  46. def __init__(self, filters, **kwargs):
  47. super(Elan, self).__init__(**kwargs)
  48. self.filters = filters
  49. self.cv1 = YoloConv(self.filters, 1, 1)
  50. self.cv2 = YoloConv(self.filters, 1, 1)
  51. self.cv3 = YoloConv(self.filters, 3, 1)
  52. self.cv4 = YoloConv(self.filters, 3, 1)
  53. self.cv5 = YoloConv(self.filters, 3, 1)
  54. self.cv6 = YoloConv(self.filters, 3, 1)
  55. self.cv7 = YoloConv(self.filters*4, 1, 1)
  56. self.concat = l.Concatenate(axis=1)
  57. def call(self, inputs, training):
  58. output1 = self.cv1(inputs, training)
  59. output2 = self.cv2(inputs, training)
  60. output3 = self.cv4(self.cv3(output2, training), training)
  61. output4 = self.cv6(self.cv5(output3, training), training)
  62. output = self.concat([output1, output2, output3, output4])
  63. output = self.cv7(output, training)
  64. return output
  65. def get_config(self):
  66. config = super(Elan, self).get_config()
  67. config.update({
  68. "filters": self.filters
  69. })
  70. return config
  71. @tf.keras.utils.register_keras_serializable()
  72. class MP(keras.layers.Layer):
  73. def __init__(self, filters, k=2):
  74. super(MP, self).__init__()
  75. self.filters = filters
  76. self.k = k
  77. self.cv1 = YoloConv(filters, 1, 1)
  78. self.cv2 = YoloConv(filters, 1, 1)
  79. self.cv3 = YoloConv(filters, 3, 2)
  80. self.pool = l.MaxPool2D(pool_size=self.k, strides=self.k, padding='same', data_format='channels_first')
  81. self.concat = l.Concatenate(axis=1)
  82. def call(self, inputs, training):
  83. output1 = self.pool(inputs)
  84. output1 = self.cv1(output1, training)
  85. output2 = self.cv2(inputs, training)
  86. output2 = self.cv3(output2, training)
  87. output = self.concat([output1, output2])
  88. return output
  89. def get_config(self):
  90. config = super(MP, self).get_config()
  91. config.update({
  92. "filters": self.filters,
  93. "k": self.k
  94. })
  95. return config
  96. @tf.keras.utils.register_keras_serializable()
  97. class SPPCSPC(keras.layers.Layer):
  98. def __init__(self, filters, e=0.5, k=(5,9,13)):
  99. super(SPPCSPC, self).__init__()
  100. self.filters = filters
  101. self.e = e
  102. self.k = k
  103. c_ = int(2 * self.filters * self.e)
  104. self.cv1 = YoloConv(c_, 1, 1)
  105. self.cv2 = YoloConv(c_, 1, 1)
  106. self.cv3 = YoloConv(c_, 3, 1)
  107. self.cv4 = YoloConv(c_, 1, 1)
  108. self.m = [l.MaxPool2D(pool_size=x, strides=1, padding='same', data_format='channels_first') for x in k]
  109. self.cv5 = YoloConv(c_, 1, 1)
  110. self.cv6 = YoloConv(c_, 3, 1)
  111. self.cv7 = YoloConv(filters, 1, 1)
  112. self.concat = l.Concatenate(axis=1)
  113. def call(self, inputs, training):
  114. output1 = self.cv4(self.cv3(self.cv1(inputs, training), training), training)
  115. output2 = self.concat([output1] + [m(output1) for m in self.m])
  116. output2 = self.cv6(self.cv5(output2, training), training)
  117. output3 = self.cv2(inputs, training)
  118. output = self.cv7(self.concat([output2, output3]), training)
  119. return output
  120. def get_config(self):
  121. config = super(SPPCSPC, self).get_config()
  122. config.update({
  123. "filters": self.filters,
  124. "k": self.k,
  125. "e": self.e
  126. })
  127. return config
  128. @tf.keras.utils.register_keras_serializable()
  129. class Elan_A(keras.layers.Layer):
  130. def __init__(self, filters):
  131. super(Elan_A, self).__init__()
  132. self.filters = filters
  133. self.cv1 = YoloConv(filters, 1, 1)
  134. self.cv2 = YoloConv(filters, 1, 1)
  135. self.cv3 = YoloConv(filters//2, 3, 1)
  136. self.cv4 = YoloConv(filters//2, 3, 1)
  137. self.cv5 = YoloConv(filters//2, 3, 1)
  138. self.cv6 = YoloConv(filters//2, 3, 1)
  139. self.cv7 = YoloConv(filters, 1, 1)
  140. self.concat = l.Concatenate(axis=1)
  141. def call(self, inputs, training):
  142. output1 = self.cv1(inputs, training)
  143. output2 = self.cv2(inputs, training)
  144. output3 = self.cv3(output2, training)
  145. output4 = self.cv4(output3, training)
  146. output5 = self.cv5(output4, training)
  147. output6 = self.cv6(output5, training)
  148. output7 = self.concat([output1, output2, output3, output4, output5, output6])
  149. output = self.cv7(output7, training)
  150. return output
  151. def get_config(self):
  152. config = super(Elan_A, self).get_config()
  153. config.update({
  154. "filters": self.filters,
  155. })
  156. return config
  157. @tf.keras.utils.register_keras_serializable()
  158. class RepConv(keras.layers.Layer):
  159. def __init__(self, filters):
  160. super(RepConv, self).__init__()
  161. self.filters = filters
  162. self.cv1 = YoloConv(filters, 3, 1, activation=None)
  163. self.cv2 = YoloConv(filters, 1, 1, activation=None)
  164. self.swish = l.Activation('swish')
  165. def call(self, inputs, training):
  166. output1 = self.cv1(inputs, training)
  167. output2 = self.cv2(inputs, training)
  168. output = self.swish(output1+output2)
  169. return output
  170. def get_config(self):
  171. config = super(RepConv, self).get_config()
  172. config.update({
  173. "filters": self.filters,
  174. })
  175. return config
  176. @tf.keras.utils.register_keras_serializable()
  177. class IDetect(keras.layers.Layer):
  178. def __init__(self, shape, no, na, grids):
  179. super(IDetect, self).__init__()
  180. #self.a = tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16)
  181. self.a = tf.Variable(tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))
  182. self.m = tf.Variable(tf.random.normal((1,no*na,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))
  183. #self.a = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,shape,1,1))
  184. #self.m = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,no*na,1,1))
  185. self.cv = YoloConv(no*na, 1, 1, bias=True, activation=None)
  186. self.shape = shape
  187. self.no = no
  188. self.na = na
  189. self.grids = grids
  190. self.reshape = l.Reshape([self.na, self.no, self.grids*self.grids])
  191. #self.permute = l.Permute([1,3,4,2])
  192. self.permute = l.Permute([1,3,2])
  193. self.activation = l.Activation('linear', dtype='float32')
  194. def call(self, inputs, training):
  195. #output = l.Add()([inputs, self.a])
  196. output = inputs + self.a
  197. output = self.cv(output, training)
  198. output = self.m * output
  199. #output = self.cv(inputs)
  200. #output = tf.reshape(output, [-1, self.na, self.no, self.grids, self.grids])
  201. output = self.reshape(output)
  202. #output = tf.transpose(output, perm=[0,1,3,4,2])
  203. output = self.permute(output)
  204. output = self.activation(output)
  205. return output
  206. def get_config(self):
  207. config = super(IDetect, self).get_config()
  208. config.update({
  209. "no": self.no,
  210. "na": self.na,
  211. "grids": self.grids,
  212. "shape": self.shape
  213. })
  214. return config
  215. def create_model():
  216. inputs = keras.Input(shape=(3, img_size, img_size))
  217. x = YoloConv(32, 3, 1)(inputs) #[32, img_size, img_size]
  218. x = YoloConv(64, 3, 2)(x) #[64, img_size/2, img_size/2]
  219. x = YoloConv(64, 3, 1)(x) #[64, img_size/2, img_size/2]
  220. x = YoloConv(128, 3, 2)(x) #[128, img_size/4, img_size/4]
  221. x = Elan(64)(x) #11
  222. x = MP(128)(x) #16
  223. route1 = Elan(128)(x) #24
  224. x = MP(256)(route1) #29
  225. route2 = Elan(256)(x) #37
  226. x = MP(512)(route2) #42
  227. x = Elan(256)(x) #50
  228. route3 = SPPCSPC(512)(x) #51
  229. x = YoloConv(256, 1, 1)(route3)
  230. x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)
  231. x = l.Concatenate(axis=1)([x, YoloConv(256, 1, 1)(route2)])
  232. route4 = Elan_A(256)(x) #63
  233. x = YoloConv(128, 1, 1)(route4)
  234. x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)
  235. x = l.Concatenate(axis=1)([x, YoloConv(128, 1, 1)(route1)])
  236. route5 = Elan_A(128)(x) #75, Connect to Detector 1
  237. x = MP(128)(route5)
  238. x = l.Concatenate(axis=1)([x, route4])
  239. route6 = Elan_A(256)(x) #88, Connect to Detector 2
  240. x = MP(256)(route6)
  241. x = l.Concatenate(axis=1)([x, route3])
  242. route7 = Elan_A(512)(x) #101, Connect to Detector 3
  243. detect1 = RepConv(256)(route5)
  244. detect2 = RepConv(512)(route6)
  245. detect3 = RepConv(1024)(route7)
  246. output1 = IDetect(256, 85, 3, 80)(detect1)
  247. output2 = IDetect(512, 85, 3, 40)(detect2)
  248. output3 = IDetect(1024, 85, 3, 20)(detect3)
  249. output = l.Concatenate(axis=-2)([output1, output2, output3])
  250. output = l.Activation('linear', dtype='float32')(output)
  251. model = keras.Model(inputs=inputs, outputs=output, name="yolov7_model")
  252. return model

损失函数的定义

YOLOv7对损失的定义可以见我另一篇文章的解读解读YOLO v7的代码(三)损失函数_gzroy的博客-CSDN博客

具体的定义在loss.py文件,我也是按照Yolov7的代码处理方式来进行tensorflow的改写,并且用了tf_function的封装来提高计算的效率, 代码如下:

  1. import tensorflow as tf
  2. import math
  3. from test1 import batch_size, na, nl, img_size, stride, balance
  4. from test1 import loss_box, loss_obj, loss_cls
  5. from test1 import batch_no_constant, anchor_no_constant, anchors_reshape, anchor_t, anchors_constant, layer_no_constant
  6. from test1 import val_batch_no_constant, val_layer_no_constant
  7. from util import *
  8. from params import *
  9. #In param:
  10. # p - predictions of the model, list of three detection level.
  11. # labels - the label of the object, dimension [batch, boxnum, 5(class, xywh)]
  12. #Out param:
  13. # results - list of the suggest positive samples for three detection level.
  14. # dimension for each element: [sample_number, 5(batch_no, anch_no, x, y, class)]
  15. # anch - list of the anchor wh ratio for the positive samples
  16. # dimension for each element: [sample_number, anchor_w, anchor_h]
  17. @tf.function(
  18. input_signature=(
  19. [tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)]
  20. )
  21. )
  22. def tf_find_3_positive(labels):
  23. batch_no = tf.zeros_like(labels)[...,0:1] + batch_no_constant
  24. targets = tf.concat((batch_no, labels), axis=-1) #targets dim [batch,box_num,6]
  25. targets = tf.reshape(targets, [batch_size, 1, -1, 6]) #targets dim [batch,1,box_num,6]
  26. targets = tf.tile(targets, [1,na,1,1])
  27. anchor_no = anchor_no_constant + tf.reshape(tf.zeros_like(batch_no), [batch_size, 1, -1, 1])
  28. targets = tf.concat([targets,anchor_no], axis=-1) #targets dim [batch,na,box_num,7(batch_no, cls, xywh, anchor_no)]
  29. g = 0.5 # bias
  30. offsets = tf.expand_dims(tf.constant([[0.,0.], [-1.,0.], [0.,-1.], [1.,0.], [0.,1.]]), axis=0) #offset dim [1,5,2]
  31. gain = tf.constant([[1.,1.,80.,80.,80.,80.,1.], [1.,1.,40.,40.,40.,40.,1.], [1.,1.,20.,20.,20.,20.,1.]])
  32. results = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
  33. anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
  34. for i in tf.range(nl):
  35. t = targets * tf.gather(gain, i)
  36. r = t[..., 4:6] / tf.gather(anchors_reshape, i)
  37. r_reciprocal = tf.math.reciprocal_no_nan(r) #1/r
  38. r_max = tf.reduce_max(tf.math.maximum(r, r_reciprocal), axis=-1)
  39. mask_t = tf.logical_and(r_max<anchor_t, r_max>0)
  40. t = t[mask_t]
  41. # Offsets
  42. gxy = t[:, 2:4] # grid xy
  43. #gxi = gain[[2, 3]] - gxy # inverse
  44. gxi = tf.gather(gain, i)[2:4] - gxy
  45. mask_xy = tf.concat([
  46. tf.ones([tf.shape(t)[0], 1], dtype=tf.bool),
  47. ((gxy % 1. < g) & (gxy > 1.)),
  48. ((gxi % 1. < g) & (gxi > 1.))
  49. ], axis=1)
  50. t = tf.repeat(tf.expand_dims(t, axis=1), 5, axis=1)[mask_xy]
  51. offsets_xy = (tf.expand_dims(tf.zeros_like(gxy, dtype=tf.float32), axis=1) + offsets)[mask_xy]
  52. xy = t[...,2:4] + offsets_xy
  53. from_which_layer = tf.ones_like(t[...,0:1]) * tf.dtypes.cast(i, tf.float32)
  54. results = results.write(i, tf.dtypes.cast(tf.concat([t[...,0:1], t[...,-1:], xy[...,1:2], xy[...,0:1], t[...,1:2], from_which_layer], axis=-1), tf.int32))
  55. anch = anch.write(i, tf.gather(tf.gather(anchors_constant, i), tf.dtypes.cast(t[...,-1], tf.int32)))
  56. return results.concat(), anch.concat()
  57. @tf.function(
  58. input_signature=([
  59. tf.TensorSpec(shape=[None, 4], dtype=tf.float32),
  60. tf.TensorSpec(shape=[None, 4], dtype=tf.float32)
  61. ])
  62. )
  63. def box_iou(box1, box2):
  64. area1 = (box1[:,2]-box1[:,0])*(box1[:,3]-box1[:,1])
  65. area2 = (box2[:,2]-box2[:,0])*(box2[:,3]-box2[:,1])
  66. intersect_wh = tf.math.minimum(box1[:,None,2:], box2[:,2:]) - tf.math.maximum(box1[:,None,:2], box2[:,:2])
  67. intersect_wh = tf.clip_by_value(intersect_wh, clip_value_min=0, clip_value_max=img_size)
  68. intersect_area = intersect_wh[...,0]*intersect_wh[...,1]
  69. iou = intersect_area/(area1[:,None]+area2-intersect_area)
  70. return iou
  71. @tf.function(
  72. input_signature=([
  73. tf.TensorSpec(shape=[None, 4], dtype=tf.float32),
  74. tf.TensorSpec(shape=[None, 4], dtype=tf.float32)
  75. ])
  76. )
  77. def bbox_ciou(box1, box2):
  78. eps=1e-7
  79. b1_x1, b1_x2 = box1[:,0]-box1[:,2]/2, box1[:,0]+box1[:,2]/2
  80. b1_y1, b1_y2 = box1[:,1]-box1[:,3]/2, box1[:,1]+box1[:,3]/2
  81. b2_x1, b2_x2 = box2[:,0]-box2[:,2]/2, box2[:,0]+box2[:,2]/2
  82. b2_y1, b2_y2 = box2[:,1]-box2[:,3]/2, box2[:,1]+box2[:,3]/2
  83. # Intersection area
  84. inter = tf.clip_by_value(
  85. tf.math.minimum(b1_x2, b2_x2) - tf.math.maximum(b1_x1, b2_x1),
  86. clip_value_min=0,
  87. clip_value_max=tf.float32.max) * tf.clip_by_value(
  88. tf.math.minimum(b1_y2, b2_y2) - tf.math.maximum(b1_y1, b2_y1),
  89. clip_value_min=0,
  90. clip_value_max=tf.float32.max)
  91. # Union Area
  92. w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
  93. w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
  94. union = w1 * h1 + w2 * h2 - inter + eps
  95. iou = inter / union
  96. cw = tf.math.maximum(b1_x2, b2_x2) - tf.math.minimum(b1_x1, b2_x1) # convex (smallest enclosing box) width
  97. ch = tf.math.maximum(b1_y2, b2_y2) - tf.math.minimum(b1_y1, b2_y1) # convex height
  98. c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared
  99. rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +
  100. (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center distance squared
  101. v = (4 / math.pi ** 2) * tf.math.pow(tf.math.atan(w2 / (h2 + eps)) - tf.math.atan(w1 / (h1 + eps)), 2)
  102. alpha = v / (v - iou + (1 + eps))
  103. return iou - (rho2 / c2 + v * alpha)
  104. @tf.function(
  105. input_signature=([
  106. tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),
  107. tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)
  108. ])
  109. )
  110. def tf_build_targets(p, labels):
  111. results, anch = tf_find_3_positive(labels)
  112. #stride = tf.constant([8., 16., 32.])
  113. grids = tf.dtypes.cast(img_size/stride, tf.int32)
  114. pxyxys = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
  115. p_obj = tf.TensorArray(tf.float32, size=nl, dynamic_size=True, element_shape=[None, 1])
  116. p_cls = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
  117. all_idx = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
  118. from_which_layer = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
  119. all_anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
  120. matching_idxs = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)
  121. matching_targets = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)
  122. matching_anchs = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)
  123. matching_layers = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)
  124. for i in tf.range(nl):
  125. idx_mask = results[...,-1]==i
  126. idx = tf.boolean_mask(results, idx_mask)
  127. layer_mask = layer_no_constant[...,0]==i
  128. grid_no = tf.gather(grids, i)
  129. pl = tf.boolean_mask(p, layer_mask)
  130. pl = tf.reshape(pl, [batch_size, na, grid_no, grid_no, -1])
  131. pi = tf.gather_nd(pl, idx[...,0:4])
  132. anchors_p = tf.boolean_mask(anch, idx_mask)
  133. p_obj = p_obj.write(i, pi[...,4:5])
  134. p_cls = p_cls.write(i, pi[...,5:])
  135. gij = tf.dtypes.cast(tf.concat([idx[...,3:4], idx[...,2:3]], axis=-1), tf.float32)
  136. pxy = (tf.math.sigmoid(pi[...,:2])*2-0.5+gij)*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
  137. pwh = (tf.math.sigmoid(pi[...,2:4])*2)**2*anchors_p*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
  138. pxywh = tf.concat([pxy, pwh], axis=-1)
  139. pxyxy = xywh2xyxy(pxywh)
  140. pxyxys = pxyxys.write(i, pxyxy)
  141. all_idx = all_idx.write(i, idx[...,0:4])
  142. from_which_layer = from_which_layer.write(i, idx[..., -1:])
  143. all_anch = all_anch.write(i, tf.boolean_mask(anch, idx_mask))
  144. pxyxys = pxyxys.concat()
  145. p_obj = p_obj.concat()
  146. p_cls = p_cls.concat()
  147. all_idx = all_idx.concat()
  148. from_which_layer = from_which_layer.concat()
  149. all_anch = all_anch.concat()
  150. for i in tf.range(batch_size):
  151. batch_mask = all_idx[...,0]==i
  152. if tf.math.reduce_sum(tf.dtypes.cast(batch_mask, tf.int32)) > 0:
  153. pxyxy_i = tf.boolean_mask(pxyxys, batch_mask)
  154. target_mask = labels[i][...,3]>0
  155. target = tf.boolean_mask(labels[i], target_mask)
  156. txywh = target[...,1:] * img_size
  157. txyxy = xywh2xyxy(txywh)
  158. pair_wise_iou = box_iou(txyxy, pxyxy_i)
  159. pair_wise_iou_loss = -tf.math.log(pair_wise_iou + 1e-8)
  160. top_k, _ = tf.math.top_k(pair_wise_iou, tf.math.minimum(10, tf.shape(pair_wise_iou)[1]))
  161. dynamic_ks = tf.clip_by_value(
  162. tf.dtypes.cast(tf.math.reduce_sum(top_k, axis=-1), tf.int32),
  163. clip_value_min=1,
  164. clip_value_max=10)
  165. gt_cls_per_image = tf.tile(
  166. tf.expand_dims(
  167. tf.one_hot(
  168. tf.dtypes.cast(target[...,0], tf.int32), nc),
  169. axis = 1),
  170. [1,tf.shape(pxyxy_i)[0],1])
  171. num_gt = tf.shape(target)[0]
  172. cls_preds_ = (
  173. tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_cls, batch_mask), 0), [num_gt, 1, 1])) *
  174. tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_obj, batch_mask), 0), [num_gt, 1, 1]))) #dimension [labels_number, positive_targets_number, 80]
  175. y = tf.math.sqrt(cls_preds_)
  176. pair_wise_cls_loss = tf.math.reduce_sum(
  177. tf.nn.sigmoid_cross_entropy_with_logits(
  178. labels = gt_cls_per_image,
  179. logits = tf.math.log(y/(1-y))),
  180. axis = -1)
  181. cost = (
  182. pair_wise_cls_loss
  183. + 3.0 * pair_wise_iou_loss
  184. )
  185. matching_matrix = tf.zeros_like(cost) #dimension [labels_number, positive_targets_number]
  186. matching_idx = tf.TensorArray(tf.int64, size=0, dynamic_size=True)
  187. for gt_idx in tf.range(num_gt):
  188. _, pos_idx = tf.math.top_k(
  189. -cost[gt_idx], k=dynamic_ks[gt_idx], sorted=True)
  190. X,Y = tf.meshgrid(gt_idx, pos_idx)
  191. matching_idx = matching_idx.write(gt_idx, tf.dtypes.cast(tf.concat([X,Y], axis=-1), tf.int64))
  192. matching_idx = matching_idx.concat()
  193. '''
  194. matching_matrix = tf.scatter_nd(
  195. matching_idx,
  196. tf.ones(tf.shape(matching_idx)[0]),
  197. tf.dtypes.cast(tf.shape(cost), tf.int64))
  198. '''
  199. matching_matrix = tf.sparse.to_dense(
  200. tf.sparse.reorder(
  201. tf.sparse.SparseTensor(
  202. indices=tf.dtypes.cast(matching_idx, tf.int64),
  203. values=tf.ones(tf.shape(matching_idx)[0]),
  204. dense_shape=tf.dtypes.cast(tf.shape(cost), tf.int64))
  205. )
  206. )
  207. anchor_matching_gt = tf.reduce_sum(matching_matrix, axis=0) #dimension [positive_targets_number]
  208. mask_1 = anchor_matching_gt>1 #it means one target match to several ground truths
  209. if tf.reduce_sum(tf.dtypes.cast(mask_1, tf.int32)) > 0: #There is at least one positive target that predict several ground truth
  210. #Get the lowest cost of the serveral ground truth of the target
  211. #For example, there are 100 targets and 10 ground truths.
  212. #The #5 target match to the #2 and #3 ground truth, the related cost are 10 for #2 and 20 for #3
  213. #Then it will select #2 gound truth for the #5 target.
  214. #mask_1 dimension [positive_targets_number]
  215. #tf.boolean_mask(cost, mask_1, axis=1), dimension [ground_truth_numer, targets_predict_sevearl_GT_number]
  216. cost_argmin = tf.math.argmin(
  217. tf.boolean_mask(cost, mask_1, axis=1), axis=0) #in above example, the cost_argmin is [2]
  218. m = tf.dtypes.cast(mask_1, tf.float32)
  219. _, target_indices = tf.math.top_k(
  220. m,
  221. k=tf.dtypes.cast(tf.math.reduce_sum(m), tf.int32)) #in above example, the target_indices is [5]
  222. #So will set the index [2,5] of matching_matrix to 1, and set the other elements of [:,5] to 0
  223. target_matching_gt_indices = tf.concat(
  224. [tf.reshape(tf.dtypes.cast(cost_argmin, tf.int32), [-1,1]), tf.reshape(target_indices, [-1,1])],
  225. axis=1)
  226. matching_matrix = tf.multiply(
  227. matching_matrix,
  228. tf.repeat(tf.reshape(tf.dtypes.cast(anchor_matching_gt<=1, tf.float32), [1,-1]), tf.shape(cost)[0], axis=0))
  229. target_value = tf.sparse.to_dense(
  230. tf.sparse.reorder(
  231. tf.sparse.SparseTensor(
  232. indices=tf.dtypes.cast(target_matching_gt_indices, tf.int64),
  233. values=tf.ones(tf.shape(target_matching_gt_indices)[0]),
  234. dense_shape=tf.dtypes.cast(tf.shape(matching_matrix), tf.int64)
  235. )
  236. )
  237. )
  238. matching_matrix = tf.add(matching_matrix, target_value)
  239. fg_mask_inboxes = tf.math.reduce_sum(matching_matrix, axis=0)>0. #The mask for the targets that will use to predict
  240. if tf.shape(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1))[0]>0:
  241. matched_gt_inds = tf.math.argmax(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1), axis=0) #Get the related gt number for the target
  242. all_idx_i = tf.boolean_mask(tf.boolean_mask(all_idx, batch_mask), fg_mask_inboxes)
  243. from_which_layer_i = tf.boolean_mask(tf.boolean_mask(from_which_layer, batch_mask), fg_mask_inboxes)
  244. all_anch_i = tf.boolean_mask(tf.boolean_mask(all_anch, batch_mask), fg_mask_inboxes)
  245. matching_idxs = matching_idxs.write(i, all_idx_i)
  246. matching_layers = matching_layers.write(i, from_which_layer_i)
  247. matching_anchs = matching_anchs.write(i, all_anch_i )
  248. matching_targets = matching_targets.write(i, tf.gather(target, matched_gt_inds))
  249. else:
  250. matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))
  251. matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))
  252. matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))
  253. matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32))
  254. else:
  255. matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))
  256. matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))
  257. matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))
  258. matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32))
  259. matching_idxs = matching_idxs.concat()
  260. matching_layers = matching_layers.concat()
  261. matching_anchs = matching_anchs.concat()
  262. matching_targets = matching_targets.concat()
  263. filter_mask = matching_idxs[:,0]!=-1
  264. matching_idxs = tf.boolean_mask(matching_idxs, filter_mask)
  265. matching_layers = tf.boolean_mask(matching_layers, filter_mask)
  266. matching_anchs = tf.boolean_mask(matching_anchs, filter_mask)
  267. matching_targets = tf.boolean_mask(matching_targets, filter_mask)
  268. #return pxyxys, all_idx, matching_idx, matching_matrix, all_idx_i, cost, pair_wise_iou, from_which_layer_i
  269. return matching_idxs, matching_layers, matching_anchs, matching_targets
  270. @tf.function(
  271. input_signature=([
  272. tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),
  273. tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)
  274. ])
  275. )
  276. def tf_loss_func(p, labels):
  277. matching_idxs, matching_layers, matching_anchs, matching_targets = tf_build_targets(p, labels)
  278. lcls, lbox, lobj = tf.zeros(1), tf.zeros(1), tf.zeros(1)
  279. grids = img_size//stride
  280. for i in tf.range(nl):
  281. layer_mask = layer_no_constant[...,0]==i
  282. grid = tf.gather(grids, i)
  283. pi = tf.reshape(tf.boolean_mask(p, layer_mask), [batch_size, na, grid, grid, -1])
  284. matching_layer_mask = matching_layers[:,0]==i
  285. if tf.reduce_sum(tf.dtypes.cast(matching_layer_mask, tf.int32))==0:
  286. continue
  287. m_idxs = tf.boolean_mask(matching_idxs, matching_layer_mask)
  288. if tf.shape(m_idxs)[0]==0:
  289. continue
  290. m_targets = tf.boolean_mask(matching_targets, matching_layer_mask)
  291. m_anchs = tf.boolean_mask(matching_anchs, matching_layer_mask)
  292. ps = tf.gather_nd(pi, m_idxs)
  293. pxy = tf.math.sigmoid(ps[:,:2])*2-0.5
  294. pwh = (tf.math.sigmoid(ps[:,2:4])*2)**2*m_anchs
  295. pbox = tf.concat([pxy,pwh], axis=-1)
  296. #selected_tbox = tf.gather_nd(labels, matching_targets[i])[:, 1:]
  297. selected_tbox = m_targets[:, 1:]
  298. selected_tbox = tf.multiply(selected_tbox, tf.dtypes.cast(grid, tf.float32))
  299. tbox_grid = tf.concat([
  300. tf.dtypes.cast(m_idxs[:,3:4], tf.float32),
  301. tf.dtypes.cast(m_idxs[:,2:3], tf.float32),
  302. tf.zeros((tf.shape(m_idxs)[0],2))],
  303. axis=-1)
  304. selected_tbox = tf.subtract(selected_tbox, tbox_grid)
  305. iou = bbox_ciou(pbox, selected_tbox)
  306. lbox += tf.math.reduce_mean(1.0 - iou) # iou loss
  307. # Objectness
  308. tobj = tf.sparse.to_dense(
  309. tf.sparse.reorder(
  310. tf.sparse.SparseTensor(
  311. indices = tf.dtypes.cast(m_idxs, tf.int64),
  312. values = (1.0 - gr) + gr * tf.clip_by_value(tf.stop_gradient(iou), clip_value_min=0, clip_value_max=tf.float32.max),
  313. dense_shape = tf.dtypes.cast(tf.shape(pi[..., 0]), tf.int64)
  314. )
  315. ), validate_indices=False
  316. )
  317. # Classification
  318. tcls = tf.one_hot(
  319. indices = tf.dtypes.cast(m_targets[:,0], tf.int32),
  320. depth = 80,
  321. dtype = tf.float32
  322. )
  323. lcls += tf.math.reduce_mean(
  324. tf.nn.sigmoid_cross_entropy_with_logits(
  325. labels = tcls,
  326. logits = ps[:, 5:]
  327. )
  328. )
  329. '''
  330. lcls += tf.math.reduce_mean(
  331. tf.nn.sparse_softmax_cross_entropy_with_logits(
  332. labels = tf.dtypes.cast(m_targets[:,0], tf.int32),
  333. logits = ps[:, 5:]
  334. )
  335. )
  336. '''
  337. obji = tf.math.reduce_mean(
  338. tf.nn.sigmoid_cross_entropy_with_logits(
  339. labels = tobj,
  340. logits = pi[..., 4]
  341. )
  342. )
  343. lobj += obji * tf.gather(balance, i)
  344. lbox *= loss_box
  345. lobj *= loss_obj
  346. lcls *= loss_cls
  347. loss = (lbox + lobj + lcls) * batch_size
  348. return loss
  349. @tf.function(
  350. input_signature=([
  351. tf.TensorSpec(shape=[None, na, 8400, 85], dtype=tf.float32),
  352. tf.TensorSpec(shape=[None, None, 5], dtype=tf.float32),
  353. tf.TensorSpec(shape=[None, 2], dtype=tf.int32),
  354. tf.TensorSpec(shape=[None], dtype=tf.int32),
  355. ])
  356. )
  357. def tf_predict_func(predictions, labels, imgs_hw, imgs_id):
  358. grids = img_size // stride
  359. batch_size = tf.shape(predictions)[0]
  360. confidence_threshold = 0.2
  361. probabilty_threshold = 0.8
  362. all_predict_result = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
  363. boxes_result = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
  364. imgs_info = tf.TensorArray(tf.int32, size=0, dynamic_size=True)
  365. for i in tf.range(nl):
  366. grid = tf.gather(grids, i)
  367. grid_x, grid_y = tf.meshgrid(tf.range(grid, dtype=tf.float32), tf.range(grid, dtype=tf.float32))
  368. grid_x = tf.reshape(grid_x, [-1, 1])
  369. grid_y = tf.reshape(grid_y, [-1, 1])
  370. #grid_xy = tf.concat([grid_y, grid_x], axis=-1)
  371. grid_xy = tf.concat([grid_x, grid_y], axis=-1)
  372. grid_xy = tf.reshape(grid_xy, [1,1,-1,2])
  373. layer_mask = val_layer_no_constant[...,0]==i
  374. #grid = tf.gather(grids, i)
  375. predict_layer = tf.boolean_mask(predictions, layer_mask)
  376. predict_layer = tf.reshape(predict_layer, [batch_size, na, -1, 85])
  377. predict_conf = tf.math.sigmoid(predict_layer[...,4:5])
  378. predict_xy = (tf.math.sigmoid(predict_layer[...,:2])*2-0.5 + \
  379. tf.dtypes.cast(grid_xy,tf.float32))*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
  380. predict_wh = (tf.math.sigmoid(predict_layer[...,2:4])*2)**2*\
  381. tf.reshape(tf.gather(anchors_constant,i), [1,na,1,2])*\
  382. tf.dtypes.cast(tf.gather(stride, i), tf.float32)
  383. predict_xywh = tf.concat([predict_xy, predict_wh], axis=-1)
  384. predict_xyxy = xywh2xyxy(predict_xywh)
  385. predict_cls = tf.reshape(tf.argmax(predict_layer[...,5:], axis=-1), [batch_size, na, -1, 1])
  386. predict_cls = tf.dtypes.cast(predict_cls, tf.float32)
  387. predict_proba = tf.nn.sigmoid(
  388. tf.reduce_max(
  389. predict_layer[...,5:], axis=-1, keepdims=True
  390. )
  391. )
  392. batch_no = tf.expand_dims(tf.tile(tf.gather(val_batch_no_constant, tf.range(batch_size)), [1,na,grid*grid]), -1)
  393. predict_result = tf.concat([batch_no, predict_conf, predict_xyxy, predict_cls, predict_proba], axis=-1)
  394. mask = tf.math.logical_and(
  395. predict_result[...,1]>=confidence_threshold,
  396. predict_result[...,-1]>=probabilty_threshold
  397. )
  398. predict_result = tf.boolean_mask(predict_result, mask)
  399. #tf.print(tf.shape(predict_result))
  400. if tf.shape(predict_result)[0] > 0:
  401. all_predict_result = all_predict_result.write(i, predict_result)
  402. #tf.print(tf.shape(predict_result))
  403. else:
  404. all_predict_result = all_predict_result.write(i, tf.zeros(shape=[1,8]))
  405. all_predict_result = all_predict_result.concat()
  406. #return all_predict_result
  407. for i in tf.range(batch_size):
  408. batch_mask = tf.math.logical_and(
  409. all_predict_result[...,0]==tf.dtypes.cast(i, tf.float32),
  410. all_predict_result[...,1]>0
  411. )
  412. predict_true_box = tf.boolean_mask(all_predict_result, batch_mask)
  413. if tf.shape(predict_true_box)[0]==0:
  414. continue
  415. original_hw = tf.dtypes.cast(tf.gather(imgs_hw, i), tf.float32)
  416. ratio = tf.dtypes.cast(tf.reduce_max(original_hw/img_size), tf.float32)
  417. predict_classes, _ = tf.unique(predict_true_box[:,6])
  418. #predict_classes_list = tf.unstack(predict_classes)
  419. #for class_id in predict_classes_list:
  420. for j in tf.range(tf.shape(predict_classes)[0]):
  421. #class_mask = tf.math.equal(predict_true_box[:, 6], class_id)
  422. class_mask = tf.math.equal(predict_true_box[:, 6], tf.gather(predict_classes, j))
  423. predict_true_box_class = tf.boolean_mask(predict_true_box, class_mask)
  424. predict_true_box_xy = predict_true_box_class[:, 2:6]
  425. predict_true_box_score = predict_true_box_class[:, 7]*predict_true_box_class[:, 1]
  426. #predict_true_box_score = predict_true_box_class[:, 1]
  427. selected_indices = tf.image.non_max_suppression(
  428. predict_true_box_xy,
  429. predict_true_box_score,
  430. 100,
  431. iou_threshold=0.2
  432. #score_threshold=confidence_threshold
  433. )
  434. #Shape [box_num, 7]
  435. selected_boxes = tf.gather(predict_true_box_class, selected_indices)
  436. #boxes_result = boxes_result.write(boxes_result.size(), selected_boxes)
  437. boxes_xyxy = selected_boxes[:,2:6]*ratio
  438. boxes_x1 = tf.clip_by_value(boxes_xyxy[:,0:1], 0., original_hw[1])
  439. boxes_x2 = tf.clip_by_value(boxes_xyxy[:,2:3], 0., original_hw[1])
  440. boxes_y1 = tf.clip_by_value(boxes_xyxy[:,1:2], 0., original_hw[0])
  441. boxes_y2 = tf.clip_by_value(boxes_xyxy[:,3:4], 0., original_hw[0])
  442. boxes_w = boxes_x2 - boxes_x1
  443. boxes_h = boxes_y2 - boxes_y1
  444. boxes = tf.concat([selected_boxes[:,0:2], boxes_x1, boxes_y1, boxes_w, boxes_h, selected_boxes[:,6:8]], axis=-1)
  445. boxes_result = boxes_result.write(boxes_result.size(), boxes)
  446. img_id = tf.gather(imgs_id, i)
  447. imgs_info = imgs_info.write(imgs_info.size(), tf.reshape(tf.stack([i, img_id]), [-1,2]))
  448. if boxes_result.size()==0:
  449. boxes_result = boxes_result.write(0, tf.zeros(shape=[1,8]))
  450. if imgs_info.size()==0:
  451. imgs_info = imgs_info.write(0, tf.dtypes.cast(tf.zeros(shape=[1,2]), tf.int32))
  452. return boxes_result.concat(), imgs_info.concat()

训练与验证

最后就是对模型进行训练和验证了,这里也是按照YOLOv7的实现方式来进行训练,验证的时候是采用pycocotools工具来进行mAP的计算。具体可以参见train.py文件

因为模型是对640*640大小的图像进行训练,对GPU的显存要求很大。在我本地的2080Ti显卡,11G内存的情况下,开启混合精度,只能设置Batch size为8,训练效果不是很理想。为此我在autodl平台租用了一个V100的32G显存的GPU来进行测试(价格是每小时2.28元),Batch size设置为32。感觉Batch size对模型的训练效果还是有比较大的影响的。最终经过了20多个epoch的训练,每个Epoch大概要训练1个小时多一点,大概花费了1天的时间,结果如下:

  1. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.270
  2. Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.411
  3. Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.289
  4. Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.162
  5. Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.302
  6. Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.334
  7. Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.268
  8. Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.476
  9. Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.528
  10. Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.338
  11. Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576
  12. Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.661

以下是对验证集的一些图片的预测结果,

按照Yolov7论文的描述,训练了300个epoch之后,mAP all能达到60%,继续训练可以进一步提高准确率,不过限于时间和资源,我就暂时训练到这个地步。

最后,我的源码都放在了Github的仓库,GitHub - gzroy/yolov7_tf2: Yolov7 implementation on tensorflow 2.x

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/295652
推荐阅读
相关标签
  

闽ICP备14008679号