赞
踩
Yolo v7去年推出之后,取得了很好的性能。作者也公布了基于Pytorch实现的源代码。在我之前的几篇博客当中,对代码进行了深入的解析,了解了Yolo v7的技术细节和实现机制。因为我一直是用的Tensorflow,因此也想尝试把代码移植到Tensorflow上。
直接运行Yolo v7源代码里面的get_coco.sh脚本下载coco数据集,脚本代码如下:
- #!/bin/bash
- # COCO 2017 dataset http://cocodataset.org
- # Download command: bash ./scripts/get_coco.sh
-
- # Download/unzip labels
- d='./' # unzip directory
- url=https://github.com/ultralytics/yolov5/releases/download/v1.0/
- f='coco2017labels-segments.zip' # or 'coco2017labels.zip', 68 MB
- echo 'Downloading' $url$f ' ...'
- curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
-
- # Download/unzip images
- d='./coco/images' # unzip directory
- url=http://images.cocodataset.org/zips/
- f1='train2017.zip' # 19G, 118k images
- f2='val2017.zip' # 1G, 5k images
- f3='test2017.zip' # 7G, 41k images (optional)
- for f in $f1 $f2 $f3; do
- echo 'Downloading' $url$f '...'
- curl -L $url$f -o $f && unzip -q $f -d $d && rm $f & # download, unzip, remove in background
- done
- wait # finish background tasks
数据下载完成之后,在images和labels目录下分别有train2017, val2017, test2017这三个子目录,对应训练/验证/测试数据。
然后我们可以基于Tensorflow来构建一个训练的数据集,需要对训练的图像进行增强,包括了包括了Mosaic拼接,随机拷贝图像,随机形变,色彩调整等,相应的图像里面的物体Label也要做相应的变换。具体的工作原理可以见我之前的博客,解读YOLO v7的代码(二)训练数据的准备-CSDN博客
这里我定义了一个Dataloader的类,负责对训练集的数据进行相应的图像增强处理,这里的处理过程和Yolov7源码的基本是一致的,只是做了一些小的修改,就是当做了Mosaic拼接之后,如果随机形变是进行缩小,那么有可能会出现物体的检测框超出图像的情况,这里我根据物体的segments数据进行了裁减,使得不会超出图像。
对于验证集的数据,我们不需要进行图像增强,只需要对图像的长边缩放到640即可,空白部分进行padding。Tensorflow的dataset的定义如下:
- def map_val_fn(t: tf.Tensor):
- filename = str(t.numpy(), encoding='utf-8')
- imgid = int(filename[20:32])
- # Load image
- img, (h0, w0), (h, w) = load_image(filename)
- #augment_hsv(img, hgain=hsv_h, sgain=hsv_s, vgain=hsv_v)
-
- # Labels
- label_filename = val_label_path + filename.split('/')[-1].split('.')[0] + '.txt'
- labels, _ = load_labels(label_filename)
- labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, 0, 0) # normalized xywh to pixel xyxy format
- labels[:, 1:5] = xyxy2xywh(labels[:, 1:5]) # convert xyxy to xywh
- labels[:, 1:5] /= img_size # normalized height 0-1
-
- img = img[:, :, ::-1].transpose(2,0,1)
- img = img/255.
-
- img_hw = tf.concat([h0, w0], axis=0)
- return img, labels, img_hw, imgid
-
- dataset_val = tf.data.Dataset.list_files("coco/images/val2017/*.jpg", shuffle=False)
- dataset_val = dataset_val.map(
- lambda x: tf.py_function(func=map_val_fn, inp=[x], Tout=[tf.float32, tf.float32, tf.int32, tf.int32]),
- num_parallel_calls=tf.data.experimental.AUTOTUNE)
- dataset_val = dataset_val\
- .padded_batch(val_batch_size, padded_shapes=([3, img_size, img_size], [None, 5], [2], []), padding_values=(144/255., 0., 0, 0))\
- .prefetch(tf.data.experimental.AUTOTUNE)
对于训练集的dataset,本来我也是打算按类似以上验证集的方式来定义,只是把map函数替换为对应的Dataloader里面的函数,具体代码可以见dataloader.py。但是我发现这种方式效率不高,在实际测试中发现,因为这个图像增强的过程比较复杂,CPU需要花费较多的事件处理,虽然Tensorflow dataset的map和prefetch提供了一个Autotune的参数可以进行并行处理的优化,但是效果不是太理想,还是出现GPU等待CPU处理完数据的情况。为此我自己写了一个并行处理的函数,利用Python multiprocessing的多进程函数,来对图像进行并行处理,当GPU在训练100个Batch的时候,CPU并行准备下100个Batch的训练数据,这样可以大幅提高性能。
具体做法是创建一个share memory给各个子进程共享,然后在训练集的图像中随机抽取一部分文件名,分配给几个子进程,每个子进程读取这些图像,进行相应的图像处理,以及对相应的图像Label文件进行处理,并把处理后的数据写入到Share memory的对应位置。最后有一个独立的子进程对Share memory的数据进行合并整理,然后就可以基于整理后的数据直接构建一个dataset了。
相关的代码如下:
- #对传入的图像ID进行增强处理,并把结果写入到共享内存
- def augment_data(imgids, datasize, memory_name, offset, q):
- dataset = Dataloader(img_size, train_image_dir, train_label_dir, imgids, hyp)
- traindata = dataset.generateTrainData(datasize)
- traindata_obj = pickle.dumps(traindata, protocol=pickle.HIGHEST_PROTOCOL)
- existing_shm = shared_memory.SharedMemory(name=memory_name)
- existing_shm.buf[offset:offset+len(traindata_obj)] = traindata_obj
- q.put((offset, offset+len(traindata_obj)))
- existing_shm.close()
-
- #对图像处理子进程的结果进行合并
- def merge_subprocess(q, subprocess_num, memory_name):
- results = []
- while(True):
- msg = q.get()
- if msg is not None:
- results.append(msg)
- if len(results)>=subprocess_num:
- break
- else:
- time.sleep(1)
- existing_shm = shared_memory.SharedMemory(name=memory_name)
- merge_data = []
- for result in results:
- merge_data.extend(pickle.loads(existing_shm.buf[result[0]:result[1]]))
- merge_data_obj = pickle.dumps(merge_data, protocol=pickle.HIGHEST_PROTOCOL)
- existing_shm.buf[:len(merge_data_obj)] = merge_data_obj
- existing_shm.close()
- q.put(len(merge_data_obj))
-
-
- #启动多个子进程进行图像增强处理,并对结果进行汇总整理
- def prepare_traindata(memory_name):
- sample_imgid = sample(imgid_train, sample_len) #随机选取一部分训练集图像的文件名
- subprocess_list = []
- for i in range(subprocess_num): #启动多个子进程,分别对图像和Label进行处理
- subprocess_list.append(
- mp.Process(
- target=augment_data,
- args=(sample_imgid[i*imgid_num_process:(i+1)*imgid_num_process], data_size//subprocess_num, memory_name, i*shared_memory_size//subprocess_num, q, )
- )
- )
- for p in subprocess_list:
- p.start()
- #启动子进程对处理结果进行汇总整理
- p0 = mp.Process(target=merge_subprocess, args=(q, subprocess_num, memory_name,))
- p0.start()
- return p0
-
-
- image_cache = shared_memory.SharedMemory(name="dataset", create=True, size=shared_memory_size) #创建共享内存
-
- merge_proc = prepare_traindata("dataset")
-
- #等待汇总子进程执行完毕,从Queue中获取数据size,并进行反序列化
- merge_proc.join()
- msg = q.get()
- if msg>0:
- traindata = pickle.loads(image_cache.buf[:msg])
- else:
- print("Could not load training data.")
- image_cache.close()
- image_cache.unlink()
-
- image_cache.close()
- image_cache.unlink()
-
- def traindata_gen():
- global traindata
- i = 0
- while i<len(traindata):
- yield traindata[i][0]/255., traindata[i][1]
- i += 1
-
- #构建dataset
- dataset = tf.data.Dataset.from_generator(
- traindata_gen,
- output_types=(tf.float32, tf.float32),
- output_shapes=((3, img_size, img_size), (None, 5)))
- dataset = dataset.padded_batch(batch_size, padded_shapes=([3, img_size, img_size], [None, 5]))
- dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
构建一个YOLO v7的模型,模型的结构解读可见我之前的另一篇博客解读YOLO v7的代码(一)模型结构研究_gzroy的博客-CSDN博客
定义一个yolo.py文件,里面定义了模型的自定义层和对模型进行组装。
- import tensorflow as tf
- from tensorflow import keras
- l=tf.keras.layers
- from params import *
-
- @tf.keras.utils.register_keras_serializable()
- class YoloConv(keras.layers.Layer):
- def __init__(self, filters, kernel_size, strides, padding='same', bias=False, activation='swish', **kwargs):
- super(YoloConv, self).__init__(**kwargs)
- self.activation = activation
- self.filters = filters
- self.kernel_size = kernel_size
- self.strides = strides
- self.padding = padding
- self.bias = bias
- self.cv = l.Conv2D(filters=self.filters,
- kernel_size=self.kernel_size,
- strides=self.strides,
- padding=self.padding,
- data_format='channels_first',
- use_bias=self.bias,
- kernel_initializer='he_normal',
- kernel_regularizer=tf.keras.regularizers.l2(l=weight_decay))
- self.bn = l.BatchNormalization(axis=1)
- self.swish = l.Activation('swish')
-
- def call(self, inputs, training):
- output = self.cv(inputs)
- output = self.bn(output, training)
- if self.activation=='swish':
- output = self.swish(output)
- else:
- output = output
- return output
-
- def get_config(self):
- config = super(YoloConv, self).get_config()
- config.update({
- "activation": self.activation,
- "filters": self.filters,
- "kernel_size": self.kernel_size,
- "strides": self.strides,
- "padding": self.padding,
- "bias": self.bias
- })
- return config
-
- @tf.keras.utils.register_keras_serializable()
- class Elan(keras.layers.Layer):
- def __init__(self, filters, **kwargs):
- super(Elan, self).__init__(**kwargs)
- self.filters = filters
- self.cv1 = YoloConv(self.filters, 1, 1)
- self.cv2 = YoloConv(self.filters, 1, 1)
- self.cv3 = YoloConv(self.filters, 3, 1)
- self.cv4 = YoloConv(self.filters, 3, 1)
- self.cv5 = YoloConv(self.filters, 3, 1)
- self.cv6 = YoloConv(self.filters, 3, 1)
- self.cv7 = YoloConv(self.filters*4, 1, 1)
- self.concat = l.Concatenate(axis=1)
-
- def call(self, inputs, training):
- output1 = self.cv1(inputs, training)
- output2 = self.cv2(inputs, training)
- output3 = self.cv4(self.cv3(output2, training), training)
- output4 = self.cv6(self.cv5(output3, training), training)
- output = self.concat([output1, output2, output3, output4])
- output = self.cv7(output, training)
- return output
-
- def get_config(self):
- config = super(Elan, self).get_config()
- config.update({
- "filters": self.filters
- })
- return config
-
- @tf.keras.utils.register_keras_serializable()
- class MP(keras.layers.Layer):
- def __init__(self, filters, k=2):
- super(MP, self).__init__()
- self.filters = filters
- self.k = k
- self.cv1 = YoloConv(filters, 1, 1)
- self.cv2 = YoloConv(filters, 1, 1)
- self.cv3 = YoloConv(filters, 3, 2)
- self.pool = l.MaxPool2D(pool_size=self.k, strides=self.k, padding='same', data_format='channels_first')
- self.concat = l.Concatenate(axis=1)
-
- def call(self, inputs, training):
- output1 = self.pool(inputs)
- output1 = self.cv1(output1, training)
- output2 = self.cv2(inputs, training)
- output2 = self.cv3(output2, training)
- output = self.concat([output1, output2])
- return output
-
- def get_config(self):
- config = super(MP, self).get_config()
- config.update({
- "filters": self.filters,
- "k": self.k
- })
- return config
-
- @tf.keras.utils.register_keras_serializable()
- class SPPCSPC(keras.layers.Layer):
- def __init__(self, filters, e=0.5, k=(5,9,13)):
- super(SPPCSPC, self).__init__()
- self.filters = filters
- self.e = e
- self.k = k
- c_ = int(2 * self.filters * self.e)
- self.cv1 = YoloConv(c_, 1, 1)
- self.cv2 = YoloConv(c_, 1, 1)
- self.cv3 = YoloConv(c_, 3, 1)
- self.cv4 = YoloConv(c_, 1, 1)
- self.m = [l.MaxPool2D(pool_size=x, strides=1, padding='same', data_format='channels_first') for x in k]
- self.cv5 = YoloConv(c_, 1, 1)
- self.cv6 = YoloConv(c_, 3, 1)
- self.cv7 = YoloConv(filters, 1, 1)
- self.concat = l.Concatenate(axis=1)
-
- def call(self, inputs, training):
- output1 = self.cv4(self.cv3(self.cv1(inputs, training), training), training)
- output2 = self.concat([output1] + [m(output1) for m in self.m])
- output2 = self.cv6(self.cv5(output2, training), training)
- output3 = self.cv2(inputs, training)
- output = self.cv7(self.concat([output2, output3]), training)
- return output
-
- def get_config(self):
- config = super(SPPCSPC, self).get_config()
- config.update({
- "filters": self.filters,
- "k": self.k,
- "e": self.e
- })
- return config
-
- @tf.keras.utils.register_keras_serializable()
- class Elan_A(keras.layers.Layer):
- def __init__(self, filters):
- super(Elan_A, self).__init__()
- self.filters = filters
- self.cv1 = YoloConv(filters, 1, 1)
- self.cv2 = YoloConv(filters, 1, 1)
- self.cv3 = YoloConv(filters//2, 3, 1)
- self.cv4 = YoloConv(filters//2, 3, 1)
- self.cv5 = YoloConv(filters//2, 3, 1)
- self.cv6 = YoloConv(filters//2, 3, 1)
- self.cv7 = YoloConv(filters, 1, 1)
- self.concat = l.Concatenate(axis=1)
-
- def call(self, inputs, training):
- output1 = self.cv1(inputs, training)
- output2 = self.cv2(inputs, training)
- output3 = self.cv3(output2, training)
- output4 = self.cv4(output3, training)
- output5 = self.cv5(output4, training)
- output6 = self.cv6(output5, training)
- output7 = self.concat([output1, output2, output3, output4, output5, output6])
- output = self.cv7(output7, training)
- return output
-
- def get_config(self):
- config = super(Elan_A, self).get_config()
- config.update({
- "filters": self.filters,
- })
- return config
-
- @tf.keras.utils.register_keras_serializable()
- class RepConv(keras.layers.Layer):
- def __init__(self, filters):
- super(RepConv, self).__init__()
- self.filters = filters
- self.cv1 = YoloConv(filters, 3, 1, activation=None)
- self.cv2 = YoloConv(filters, 1, 1, activation=None)
- self.swish = l.Activation('swish')
-
- def call(self, inputs, training):
- output1 = self.cv1(inputs, training)
- output2 = self.cv2(inputs, training)
- output = self.swish(output1+output2)
- return output
-
- def get_config(self):
- config = super(RepConv, self).get_config()
- config.update({
- "filters": self.filters,
- })
- return config
-
- @tf.keras.utils.register_keras_serializable()
- class IDetect(keras.layers.Layer):
- def __init__(self, shape, no, na, grids):
- super(IDetect, self).__init__()
- #self.a = tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16)
- self.a = tf.Variable(tf.random.normal((1,shape,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))
- self.m = tf.Variable(tf.random.normal((1,no*na,1,1), mean=0.0, stddev=0.02, dtype=tf.dtypes.float16))
- #self.a = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,shape,1,1))
- #self.m = keras.initializers.RandomNormal(mean=0., stddev=0.02)(shape=(1,no*na,1,1))
- self.cv = YoloConv(no*na, 1, 1, bias=True, activation=None)
- self.shape = shape
- self.no = no
- self.na = na
- self.grids = grids
- self.reshape = l.Reshape([self.na, self.no, self.grids*self.grids])
- #self.permute = l.Permute([1,3,4,2])
- self.permute = l.Permute([1,3,2])
- self.activation = l.Activation('linear', dtype='float32')
-
- def call(self, inputs, training):
- #output = l.Add()([inputs, self.a])
- output = inputs + self.a
- output = self.cv(output, training)
- output = self.m * output
- #output = self.cv(inputs)
- #output = tf.reshape(output, [-1, self.na, self.no, self.grids, self.grids])
- output = self.reshape(output)
- #output = tf.transpose(output, perm=[0,1,3,4,2])
- output = self.permute(output)
- output = self.activation(output)
- return output
-
- def get_config(self):
- config = super(IDetect, self).get_config()
- config.update({
- "no": self.no,
- "na": self.na,
- "grids": self.grids,
- "shape": self.shape
- })
- return config
-
- def create_model():
- inputs = keras.Input(shape=(3, img_size, img_size))
- x = YoloConv(32, 3, 1)(inputs) #[32, img_size, img_size]
- x = YoloConv(64, 3, 2)(x) #[64, img_size/2, img_size/2]
- x = YoloConv(64, 3, 1)(x) #[64, img_size/2, img_size/2]
- x = YoloConv(128, 3, 2)(x) #[128, img_size/4, img_size/4]
- x = Elan(64)(x) #11
- x = MP(128)(x) #16
- route1 = Elan(128)(x) #24
- x = MP(256)(route1) #29
- route2 = Elan(256)(x) #37
- x = MP(512)(route2) #42
- x = Elan(256)(x) #50
- route3 = SPPCSPC(512)(x) #51
- x = YoloConv(256, 1, 1)(route3)
- x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)
- x = l.Concatenate(axis=1)([x, YoloConv(256, 1, 1)(route2)])
- route4 = Elan_A(256)(x) #63
- x = YoloConv(128, 1, 1)(route4)
- x = l.UpSampling2D(size=(2, 2), data_format='channels_first', interpolation='nearest')(x)
- x = l.Concatenate(axis=1)([x, YoloConv(128, 1, 1)(route1)])
- route5 = Elan_A(128)(x) #75, Connect to Detector 1
- x = MP(128)(route5)
- x = l.Concatenate(axis=1)([x, route4])
- route6 = Elan_A(256)(x) #88, Connect to Detector 2
- x = MP(256)(route6)
- x = l.Concatenate(axis=1)([x, route3])
- route7 = Elan_A(512)(x) #101, Connect to Detector 3
- detect1 = RepConv(256)(route5)
- detect2 = RepConv(512)(route6)
- detect3 = RepConv(1024)(route7)
- output1 = IDetect(256, 85, 3, 80)(detect1)
- output2 = IDetect(512, 85, 3, 40)(detect2)
- output3 = IDetect(1024, 85, 3, 20)(detect3)
- output = l.Concatenate(axis=-2)([output1, output2, output3])
- output = l.Activation('linear', dtype='float32')(output)
- model = keras.Model(inputs=inputs, outputs=output, name="yolov7_model")
- return model
YOLOv7对损失的定义可以见我另一篇文章的解读解读YOLO v7的代码(三)损失函数_gzroy的博客-CSDN博客
具体的定义在loss.py文件,我也是按照Yolov7的代码处理方式来进行tensorflow的改写,并且用了tf_function的封装来提高计算的效率, 代码如下:
- import tensorflow as tf
- import math
- from test1 import batch_size, na, nl, img_size, stride, balance
- from test1 import loss_box, loss_obj, loss_cls
- from test1 import batch_no_constant, anchor_no_constant, anchors_reshape, anchor_t, anchors_constant, layer_no_constant
- from test1 import val_batch_no_constant, val_layer_no_constant
- from util import *
- from params import *
-
- #In param:
- # p - predictions of the model, list of three detection level.
- # labels - the label of the object, dimension [batch, boxnum, 5(class, xywh)]
- #Out param:
- # results - list of the suggest positive samples for three detection level.
- # dimension for each element: [sample_number, 5(batch_no, anch_no, x, y, class)]
- # anch - list of the anchor wh ratio for the positive samples
- # dimension for each element: [sample_number, anchor_w, anchor_h]
- @tf.function(
- input_signature=(
- [tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)]
- )
- )
- def tf_find_3_positive(labels):
- batch_no = tf.zeros_like(labels)[...,0:1] + batch_no_constant
- targets = tf.concat((batch_no, labels), axis=-1) #targets dim [batch,box_num,6]
- targets = tf.reshape(targets, [batch_size, 1, -1, 6]) #targets dim [batch,1,box_num,6]
- targets = tf.tile(targets, [1,na,1,1])
- anchor_no = anchor_no_constant + tf.reshape(tf.zeros_like(batch_no), [batch_size, 1, -1, 1])
- targets = tf.concat([targets,anchor_no], axis=-1) #targets dim [batch,na,box_num,7(batch_no, cls, xywh, anchor_no)]
-
- g = 0.5 # bias
- offsets = tf.expand_dims(tf.constant([[0.,0.], [-1.,0.], [0.,-1.], [1.,0.], [0.,1.]]), axis=0) #offset dim [1,5,2]
-
- gain = tf.constant([[1.,1.,80.,80.,80.,80.,1.], [1.,1.,40.,40.,40.,40.,1.], [1.,1.,20.,20.,20.,20.,1.]])
-
- results = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
- anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
-
- for i in tf.range(nl):
- t = targets * tf.gather(gain, i)
- r = t[..., 4:6] / tf.gather(anchors_reshape, i)
- r_reciprocal = tf.math.reciprocal_no_nan(r) #1/r
- r_max = tf.reduce_max(tf.math.maximum(r, r_reciprocal), axis=-1)
- mask_t = tf.logical_and(r_max<anchor_t, r_max>0)
- t = t[mask_t]
- # Offsets
- gxy = t[:, 2:4] # grid xy
- #gxi = gain[[2, 3]] - gxy # inverse
- gxi = tf.gather(gain, i)[2:4] - gxy
- mask_xy = tf.concat([
- tf.ones([tf.shape(t)[0], 1], dtype=tf.bool),
- ((gxy % 1. < g) & (gxy > 1.)),
- ((gxi % 1. < g) & (gxi > 1.))
- ], axis=1)
- t = tf.repeat(tf.expand_dims(t, axis=1), 5, axis=1)[mask_xy]
- offsets_xy = (tf.expand_dims(tf.zeros_like(gxy, dtype=tf.float32), axis=1) + offsets)[mask_xy]
- xy = t[...,2:4] + offsets_xy
- from_which_layer = tf.ones_like(t[...,0:1]) * tf.dtypes.cast(i, tf.float32)
- results = results.write(i, tf.dtypes.cast(tf.concat([t[...,0:1], t[...,-1:], xy[...,1:2], xy[...,0:1], t[...,1:2], from_which_layer], axis=-1), tf.int32))
- anch = anch.write(i, tf.gather(tf.gather(anchors_constant, i), tf.dtypes.cast(t[...,-1], tf.int32)))
- return results.concat(), anch.concat()
-
- @tf.function(
- input_signature=([
- tf.TensorSpec(shape=[None, 4], dtype=tf.float32),
- tf.TensorSpec(shape=[None, 4], dtype=tf.float32)
- ])
- )
- def box_iou(box1, box2):
- area1 = (box1[:,2]-box1[:,0])*(box1[:,3]-box1[:,1])
- area2 = (box2[:,2]-box2[:,0])*(box2[:,3]-box2[:,1])
-
- intersect_wh = tf.math.minimum(box1[:,None,2:], box2[:,2:]) - tf.math.maximum(box1[:,None,:2], box2[:,:2])
- intersect_wh = tf.clip_by_value(intersect_wh, clip_value_min=0, clip_value_max=img_size)
- intersect_area = intersect_wh[...,0]*intersect_wh[...,1]
-
- iou = intersect_area/(area1[:,None]+area2-intersect_area)
- return iou
-
- @tf.function(
- input_signature=([
- tf.TensorSpec(shape=[None, 4], dtype=tf.float32),
- tf.TensorSpec(shape=[None, 4], dtype=tf.float32)
- ])
- )
- def bbox_ciou(box1, box2):
- eps=1e-7
- b1_x1, b1_x2 = box1[:,0]-box1[:,2]/2, box1[:,0]+box1[:,2]/2
- b1_y1, b1_y2 = box1[:,1]-box1[:,3]/2, box1[:,1]+box1[:,3]/2
- b2_x1, b2_x2 = box2[:,0]-box2[:,2]/2, box2[:,0]+box2[:,2]/2
- b2_y1, b2_y2 = box2[:,1]-box2[:,3]/2, box2[:,1]+box2[:,3]/2
-
- # Intersection area
- inter = tf.clip_by_value(
- tf.math.minimum(b1_x2, b2_x2) - tf.math.maximum(b1_x1, b2_x1),
- clip_value_min=0,
- clip_value_max=tf.float32.max) * tf.clip_by_value(
- tf.math.minimum(b1_y2, b2_y2) - tf.math.maximum(b1_y1, b2_y1),
- clip_value_min=0,
- clip_value_max=tf.float32.max)
-
- # Union Area
- w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
- w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
- union = w1 * h1 + w2 * h2 - inter + eps
-
- iou = inter / union
-
- cw = tf.math.maximum(b1_x2, b2_x2) - tf.math.minimum(b1_x1, b2_x1) # convex (smallest enclosing box) width
- ch = tf.math.maximum(b1_y2, b2_y2) - tf.math.minimum(b1_y1, b2_y1) # convex height
-
- c2 = cw ** 2 + ch ** 2 + eps # convex diagonal squared
- rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +
- (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4 # center distance squared
-
- v = (4 / math.pi ** 2) * tf.math.pow(tf.math.atan(w2 / (h2 + eps)) - tf.math.atan(w1 / (h1 + eps)), 2)
- alpha = v / (v - iou + (1 + eps))
- return iou - (rho2 / c2 + v * alpha)
-
- @tf.function(
- input_signature=([
- tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),
- tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)
- ])
- )
- def tf_build_targets(p, labels):
- results, anch = tf_find_3_positive(labels)
-
- #stride = tf.constant([8., 16., 32.])
- grids = tf.dtypes.cast(img_size/stride, tf.int32)
-
- pxyxys = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
- p_obj = tf.TensorArray(tf.float32, size=nl, dynamic_size=True, element_shape=[None, 1])
- p_cls = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
- all_idx = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
- from_which_layer = tf.TensorArray(tf.int32, size=nl, dynamic_size=False)
- all_anch = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
-
- matching_idxs = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)
- matching_targets = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)
- matching_anchs = tf.TensorArray(tf.float32, size=batch_size, dynamic_size=False)
- matching_layers = tf.TensorArray(tf.int32, size=batch_size, dynamic_size=False)
-
- for i in tf.range(nl):
- idx_mask = results[...,-1]==i
- idx = tf.boolean_mask(results, idx_mask)
- layer_mask = layer_no_constant[...,0]==i
- grid_no = tf.gather(grids, i)
- pl = tf.boolean_mask(p, layer_mask)
- pl = tf.reshape(pl, [batch_size, na, grid_no, grid_no, -1])
- pi = tf.gather_nd(pl, idx[...,0:4])
- anchors_p = tf.boolean_mask(anch, idx_mask)
- p_obj = p_obj.write(i, pi[...,4:5])
- p_cls = p_cls.write(i, pi[...,5:])
- gij = tf.dtypes.cast(tf.concat([idx[...,3:4], idx[...,2:3]], axis=-1), tf.float32)
- pxy = (tf.math.sigmoid(pi[...,:2])*2-0.5+gij)*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
- pwh = (tf.math.sigmoid(pi[...,2:4])*2)**2*anchors_p*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
- pxywh = tf.concat([pxy, pwh], axis=-1)
- pxyxy = xywh2xyxy(pxywh)
- pxyxys = pxyxys.write(i, pxyxy)
- all_idx = all_idx.write(i, idx[...,0:4])
- from_which_layer = from_which_layer.write(i, idx[..., -1:])
- all_anch = all_anch.write(i, tf.boolean_mask(anch, idx_mask))
-
- pxyxys = pxyxys.concat()
- p_obj = p_obj.concat()
- p_cls = p_cls.concat()
- all_idx = all_idx.concat()
- from_which_layer = from_which_layer.concat()
- all_anch = all_anch.concat()
-
- for i in tf.range(batch_size):
- batch_mask = all_idx[...,0]==i
- if tf.math.reduce_sum(tf.dtypes.cast(batch_mask, tf.int32)) > 0:
- pxyxy_i = tf.boolean_mask(pxyxys, batch_mask)
- target_mask = labels[i][...,3]>0
- target = tf.boolean_mask(labels[i], target_mask)
- txywh = target[...,1:] * img_size
- txyxy = xywh2xyxy(txywh)
- pair_wise_iou = box_iou(txyxy, pxyxy_i)
- pair_wise_iou_loss = -tf.math.log(pair_wise_iou + 1e-8)
-
- top_k, _ = tf.math.top_k(pair_wise_iou, tf.math.minimum(10, tf.shape(pair_wise_iou)[1]))
- dynamic_ks = tf.clip_by_value(
- tf.dtypes.cast(tf.math.reduce_sum(top_k, axis=-1), tf.int32),
- clip_value_min=1,
- clip_value_max=10)
-
- gt_cls_per_image = tf.tile(
- tf.expand_dims(
- tf.one_hot(
- tf.dtypes.cast(target[...,0], tf.int32), nc),
- axis = 1),
- [1,tf.shape(pxyxy_i)[0],1])
-
- num_gt = tf.shape(target)[0]
- cls_preds_ = (
- tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_cls, batch_mask), 0), [num_gt, 1, 1])) *
- tf.math.sigmoid(tf.tile(tf.expand_dims(tf.boolean_mask(p_obj, batch_mask), 0), [num_gt, 1, 1]))) #dimension [labels_number, positive_targets_number, 80]
- y = tf.math.sqrt(cls_preds_)
- pair_wise_cls_loss = tf.math.reduce_sum(
- tf.nn.sigmoid_cross_entropy_with_logits(
- labels = gt_cls_per_image,
- logits = tf.math.log(y/(1-y))),
- axis = -1)
-
- cost = (
- pair_wise_cls_loss
- + 3.0 * pair_wise_iou_loss
- )
-
- matching_matrix = tf.zeros_like(cost) #dimension [labels_number, positive_targets_number]
-
- matching_idx = tf.TensorArray(tf.int64, size=0, dynamic_size=True)
- for gt_idx in tf.range(num_gt):
- _, pos_idx = tf.math.top_k(
- -cost[gt_idx], k=dynamic_ks[gt_idx], sorted=True)
- X,Y = tf.meshgrid(gt_idx, pos_idx)
- matching_idx = matching_idx.write(gt_idx, tf.dtypes.cast(tf.concat([X,Y], axis=-1), tf.int64))
-
- matching_idx = matching_idx.concat()
- '''
- matching_matrix = tf.scatter_nd(
- matching_idx,
- tf.ones(tf.shape(matching_idx)[0]),
- tf.dtypes.cast(tf.shape(cost), tf.int64))
- '''
- matching_matrix = tf.sparse.to_dense(
- tf.sparse.reorder(
- tf.sparse.SparseTensor(
- indices=tf.dtypes.cast(matching_idx, tf.int64),
- values=tf.ones(tf.shape(matching_idx)[0]),
- dense_shape=tf.dtypes.cast(tf.shape(cost), tf.int64))
- )
- )
-
- anchor_matching_gt = tf.reduce_sum(matching_matrix, axis=0) #dimension [positive_targets_number]
- mask_1 = anchor_matching_gt>1 #it means one target match to several ground truths
-
- if tf.reduce_sum(tf.dtypes.cast(mask_1, tf.int32)) > 0: #There is at least one positive target that predict several ground truth
- #Get the lowest cost of the serveral ground truth of the target
- #For example, there are 100 targets and 10 ground truths.
- #The #5 target match to the #2 and #3 ground truth, the related cost are 10 for #2 and 20 for #3
- #Then it will select #2 gound truth for the #5 target.
- #mask_1 dimension [positive_targets_number]
- #tf.boolean_mask(cost, mask_1, axis=1), dimension [ground_truth_numer, targets_predict_sevearl_GT_number]
- cost_argmin = tf.math.argmin(
- tf.boolean_mask(cost, mask_1, axis=1), axis=0) #in above example, the cost_argmin is [2]
- m = tf.dtypes.cast(mask_1, tf.float32)
- _, target_indices = tf.math.top_k(
- m,
- k=tf.dtypes.cast(tf.math.reduce_sum(m), tf.int32)) #in above example, the target_indices is [5]
- #So will set the index [2,5] of matching_matrix to 1, and set the other elements of [:,5] to 0
- target_matching_gt_indices = tf.concat(
- [tf.reshape(tf.dtypes.cast(cost_argmin, tf.int32), [-1,1]), tf.reshape(target_indices, [-1,1])],
- axis=1)
- matching_matrix = tf.multiply(
- matching_matrix,
- tf.repeat(tf.reshape(tf.dtypes.cast(anchor_matching_gt<=1, tf.float32), [1,-1]), tf.shape(cost)[0], axis=0))
- target_value = tf.sparse.to_dense(
- tf.sparse.reorder(
- tf.sparse.SparseTensor(
- indices=tf.dtypes.cast(target_matching_gt_indices, tf.int64),
- values=tf.ones(tf.shape(target_matching_gt_indices)[0]),
- dense_shape=tf.dtypes.cast(tf.shape(matching_matrix), tf.int64)
- )
- )
- )
- matching_matrix = tf.add(matching_matrix, target_value)
-
- fg_mask_inboxes = tf.math.reduce_sum(matching_matrix, axis=0)>0. #The mask for the targets that will use to predict
- if tf.shape(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1))[0]>0:
- matched_gt_inds = tf.math.argmax(tf.boolean_mask(matching_matrix, fg_mask_inboxes, axis=1), axis=0) #Get the related gt number for the target
-
- all_idx_i = tf.boolean_mask(tf.boolean_mask(all_idx, batch_mask), fg_mask_inboxes)
- from_which_layer_i = tf.boolean_mask(tf.boolean_mask(from_which_layer, batch_mask), fg_mask_inboxes)
- all_anch_i = tf.boolean_mask(tf.boolean_mask(all_anch, batch_mask), fg_mask_inboxes)
-
- matching_idxs = matching_idxs.write(i, all_idx_i)
- matching_layers = matching_layers.write(i, from_which_layer_i)
- matching_anchs = matching_anchs.write(i, all_anch_i )
- matching_targets = matching_targets.write(i, tf.gather(target, matched_gt_inds))
- else:
- matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))
- matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))
- matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))
- matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32))
-
- else:
- matching_idxs = matching_idxs.write(i, tf.constant([[-1,-1,-1,-1]], dtype=tf.int32))
- matching_layers = matching_layers.write(i, tf.constant([[-1]], dtype=tf.int32))
- matching_anchs = matching_anchs.write(i, tf.constant([[-1, -1]], dtype=tf.float32))
- matching_targets = matching_targets.write(i, tf.constant([[-1, -1, -1, -1, -1]], dtype=tf.float32))
-
- matching_idxs = matching_idxs.concat()
- matching_layers = matching_layers.concat()
- matching_anchs = matching_anchs.concat()
- matching_targets = matching_targets.concat()
- filter_mask = matching_idxs[:,0]!=-1
- matching_idxs = tf.boolean_mask(matching_idxs, filter_mask)
- matching_layers = tf.boolean_mask(matching_layers, filter_mask)
- matching_anchs = tf.boolean_mask(matching_anchs, filter_mask)
- matching_targets = tf.boolean_mask(matching_targets, filter_mask)
-
- #return pxyxys, all_idx, matching_idx, matching_matrix, all_idx_i, cost, pair_wise_iou, from_which_layer_i
- return matching_idxs, matching_layers, matching_anchs, matching_targets
-
- @tf.function(
- input_signature=([
- tf.TensorSpec(shape=[batch_size, na, None, 85], dtype=tf.float32),
- tf.TensorSpec(shape=[batch_size, None, 5], dtype=tf.float32)
- ])
- )
- def tf_loss_func(p, labels):
- matching_idxs, matching_layers, matching_anchs, matching_targets = tf_build_targets(p, labels)
- lcls, lbox, lobj = tf.zeros(1), tf.zeros(1), tf.zeros(1)
-
- grids = img_size//stride
- for i in tf.range(nl):
- layer_mask = layer_no_constant[...,0]==i
- grid = tf.gather(grids, i)
- pi = tf.reshape(tf.boolean_mask(p, layer_mask), [batch_size, na, grid, grid, -1])
- matching_layer_mask = matching_layers[:,0]==i
- if tf.reduce_sum(tf.dtypes.cast(matching_layer_mask, tf.int32))==0:
- continue
- m_idxs = tf.boolean_mask(matching_idxs, matching_layer_mask)
- if tf.shape(m_idxs)[0]==0:
- continue
- m_targets = tf.boolean_mask(matching_targets, matching_layer_mask)
- m_anchs = tf.boolean_mask(matching_anchs, matching_layer_mask)
- ps = tf.gather_nd(pi, m_idxs)
- pxy = tf.math.sigmoid(ps[:,:2])*2-0.5
- pwh = (tf.math.sigmoid(ps[:,2:4])*2)**2*m_anchs
- pbox = tf.concat([pxy,pwh], axis=-1)
- #selected_tbox = tf.gather_nd(labels, matching_targets[i])[:, 1:]
- selected_tbox = m_targets[:, 1:]
- selected_tbox = tf.multiply(selected_tbox, tf.dtypes.cast(grid, tf.float32))
- tbox_grid = tf.concat([
- tf.dtypes.cast(m_idxs[:,3:4], tf.float32),
- tf.dtypes.cast(m_idxs[:,2:3], tf.float32),
- tf.zeros((tf.shape(m_idxs)[0],2))],
- axis=-1)
- selected_tbox = tf.subtract(selected_tbox, tbox_grid)
- iou = bbox_ciou(pbox, selected_tbox)
- lbox += tf.math.reduce_mean(1.0 - iou) # iou loss
-
- # Objectness
- tobj = tf.sparse.to_dense(
- tf.sparse.reorder(
- tf.sparse.SparseTensor(
- indices = tf.dtypes.cast(m_idxs, tf.int64),
- values = (1.0 - gr) + gr * tf.clip_by_value(tf.stop_gradient(iou), clip_value_min=0, clip_value_max=tf.float32.max),
- dense_shape = tf.dtypes.cast(tf.shape(pi[..., 0]), tf.int64)
- )
- ), validate_indices=False
- )
-
- # Classification
-
- tcls = tf.one_hot(
- indices = tf.dtypes.cast(m_targets[:,0], tf.int32),
- depth = 80,
- dtype = tf.float32
- )
-
- lcls += tf.math.reduce_mean(
- tf.nn.sigmoid_cross_entropy_with_logits(
- labels = tcls,
- logits = ps[:, 5:]
- )
- )
- '''
- lcls += tf.math.reduce_mean(
- tf.nn.sparse_softmax_cross_entropy_with_logits(
- labels = tf.dtypes.cast(m_targets[:,0], tf.int32),
- logits = ps[:, 5:]
- )
- )
- '''
- obji = tf.math.reduce_mean(
- tf.nn.sigmoid_cross_entropy_with_logits(
- labels = tobj,
- logits = pi[..., 4]
- )
- )
-
- lobj += obji * tf.gather(balance, i)
-
- lbox *= loss_box
- lobj *= loss_obj
- lcls *= loss_cls
-
- loss = (lbox + lobj + lcls) * batch_size
-
- return loss
-
- @tf.function(
- input_signature=([
- tf.TensorSpec(shape=[None, na, 8400, 85], dtype=tf.float32),
- tf.TensorSpec(shape=[None, None, 5], dtype=tf.float32),
- tf.TensorSpec(shape=[None, 2], dtype=tf.int32),
- tf.TensorSpec(shape=[None], dtype=tf.int32),
- ])
- )
- def tf_predict_func(predictions, labels, imgs_hw, imgs_id):
- grids = img_size // stride
- batch_size = tf.shape(predictions)[0]
- confidence_threshold = 0.2
- probabilty_threshold = 0.8
- all_predict_result = tf.TensorArray(tf.float32, size=nl, dynamic_size=False)
- boxes_result = tf.TensorArray(tf.float32, size=0, dynamic_size=True)
- imgs_info = tf.TensorArray(tf.int32, size=0, dynamic_size=True)
- for i in tf.range(nl):
- grid = tf.gather(grids, i)
- grid_x, grid_y = tf.meshgrid(tf.range(grid, dtype=tf.float32), tf.range(grid, dtype=tf.float32))
- grid_x = tf.reshape(grid_x, [-1, 1])
- grid_y = tf.reshape(grid_y, [-1, 1])
- #grid_xy = tf.concat([grid_y, grid_x], axis=-1)
- grid_xy = tf.concat([grid_x, grid_y], axis=-1)
- grid_xy = tf.reshape(grid_xy, [1,1,-1,2])
- layer_mask = val_layer_no_constant[...,0]==i
- #grid = tf.gather(grids, i)
- predict_layer = tf.boolean_mask(predictions, layer_mask)
- predict_layer = tf.reshape(predict_layer, [batch_size, na, -1, 85])
- predict_conf = tf.math.sigmoid(predict_layer[...,4:5])
- predict_xy = (tf.math.sigmoid(predict_layer[...,:2])*2-0.5 + \
- tf.dtypes.cast(grid_xy,tf.float32))*tf.dtypes.cast(tf.gather(stride, i), tf.float32)
- predict_wh = (tf.math.sigmoid(predict_layer[...,2:4])*2)**2*\
- tf.reshape(tf.gather(anchors_constant,i), [1,na,1,2])*\
- tf.dtypes.cast(tf.gather(stride, i), tf.float32)
- predict_xywh = tf.concat([predict_xy, predict_wh], axis=-1)
- predict_xyxy = xywh2xyxy(predict_xywh)
- predict_cls = tf.reshape(tf.argmax(predict_layer[...,5:], axis=-1), [batch_size, na, -1, 1])
- predict_cls = tf.dtypes.cast(predict_cls, tf.float32)
- predict_proba = tf.nn.sigmoid(
- tf.reduce_max(
- predict_layer[...,5:], axis=-1, keepdims=True
- )
- )
- batch_no = tf.expand_dims(tf.tile(tf.gather(val_batch_no_constant, tf.range(batch_size)), [1,na,grid*grid]), -1)
- predict_result = tf.concat([batch_no, predict_conf, predict_xyxy, predict_cls, predict_proba], axis=-1)
- mask = tf.math.logical_and(
- predict_result[...,1]>=confidence_threshold,
- predict_result[...,-1]>=probabilty_threshold
- )
- predict_result = tf.boolean_mask(predict_result, mask)
- #tf.print(tf.shape(predict_result))
- if tf.shape(predict_result)[0] > 0:
- all_predict_result = all_predict_result.write(i, predict_result)
- #tf.print(tf.shape(predict_result))
- else:
- all_predict_result = all_predict_result.write(i, tf.zeros(shape=[1,8]))
- all_predict_result = all_predict_result.concat()
- #return all_predict_result
-
- for i in tf.range(batch_size):
- batch_mask = tf.math.logical_and(
- all_predict_result[...,0]==tf.dtypes.cast(i, tf.float32),
- all_predict_result[...,1]>0
- )
- predict_true_box = tf.boolean_mask(all_predict_result, batch_mask)
- if tf.shape(predict_true_box)[0]==0:
- continue
- original_hw = tf.dtypes.cast(tf.gather(imgs_hw, i), tf.float32)
- ratio = tf.dtypes.cast(tf.reduce_max(original_hw/img_size), tf.float32)
- predict_classes, _ = tf.unique(predict_true_box[:,6])
- #predict_classes_list = tf.unstack(predict_classes)
- #for class_id in predict_classes_list:
- for j in tf.range(tf.shape(predict_classes)[0]):
- #class_mask = tf.math.equal(predict_true_box[:, 6], class_id)
- class_mask = tf.math.equal(predict_true_box[:, 6], tf.gather(predict_classes, j))
- predict_true_box_class = tf.boolean_mask(predict_true_box, class_mask)
- predict_true_box_xy = predict_true_box_class[:, 2:6]
- predict_true_box_score = predict_true_box_class[:, 7]*predict_true_box_class[:, 1]
- #predict_true_box_score = predict_true_box_class[:, 1]
- selected_indices = tf.image.non_max_suppression(
- predict_true_box_xy,
- predict_true_box_score,
- 100,
- iou_threshold=0.2
- #score_threshold=confidence_threshold
- )
- #Shape [box_num, 7]
- selected_boxes = tf.gather(predict_true_box_class, selected_indices)
- #boxes_result = boxes_result.write(boxes_result.size(), selected_boxes)
- boxes_xyxy = selected_boxes[:,2:6]*ratio
- boxes_x1 = tf.clip_by_value(boxes_xyxy[:,0:1], 0., original_hw[1])
- boxes_x2 = tf.clip_by_value(boxes_xyxy[:,2:3], 0., original_hw[1])
- boxes_y1 = tf.clip_by_value(boxes_xyxy[:,1:2], 0., original_hw[0])
- boxes_y2 = tf.clip_by_value(boxes_xyxy[:,3:4], 0., original_hw[0])
- boxes_w = boxes_x2 - boxes_x1
- boxes_h = boxes_y2 - boxes_y1
- boxes = tf.concat([selected_boxes[:,0:2], boxes_x1, boxes_y1, boxes_w, boxes_h, selected_boxes[:,6:8]], axis=-1)
- boxes_result = boxes_result.write(boxes_result.size(), boxes)
- img_id = tf.gather(imgs_id, i)
- imgs_info = imgs_info.write(imgs_info.size(), tf.reshape(tf.stack([i, img_id]), [-1,2]))
- if boxes_result.size()==0:
- boxes_result = boxes_result.write(0, tf.zeros(shape=[1,8]))
- if imgs_info.size()==0:
- imgs_info = imgs_info.write(0, tf.dtypes.cast(tf.zeros(shape=[1,2]), tf.int32))
-
- return boxes_result.concat(), imgs_info.concat()
最后就是对模型进行训练和验证了,这里也是按照YOLOv7的实现方式来进行训练,验证的时候是采用pycocotools工具来进行mAP的计算。具体可以参见train.py文件
因为模型是对640*640大小的图像进行训练,对GPU的显存要求很大。在我本地的2080Ti显卡,11G内存的情况下,开启混合精度,只能设置Batch size为8,训练效果不是很理想。为此我在autodl平台租用了一个V100的32G显存的GPU来进行测试(价格是每小时2.28元),Batch size设置为32。感觉Batch size对模型的训练效果还是有比较大的影响的。最终经过了20多个epoch的训练,每个Epoch大概要训练1个小时多一点,大概花费了1天的时间,结果如下:
- Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.270
- Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.411
- Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.289
- Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.162
- Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.302
- Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.334
- Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.268
- Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.476
- Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.528
- Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.338
- Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.576
- Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.661
以下是对验证集的一些图片的预测结果,
按照Yolov7论文的描述,训练了300个epoch之后,mAP all能达到60%,继续训练可以进一步提高准确率,不过限于时间和资源,我就暂时训练到这个地步。
最后,我的源码都放在了Github的仓库,GitHub - gzroy/yolov7_tf2: Yolov7 implementation on tensorflow 2.x
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。