当前位置:   article > 正文

tensorflow2.0-keras-LaneNet车道线检测模型复现细节总结_lanenet模型复现

lanenet模型复现

        lanenet模型的复现文章有很多,原文Towards End-to-End Lane Detection: an Instance Segmentation Approach的代码是基于tensorflow1.0版本实现的。本文将在以tensorflow2.0版本为后端的keras上对lanenet模型部分功能进行复现,实现细节和原文有出入,但是实现目的和原文是一致的。希望读者能够为本文提出宝贵的建议。


一、LaneNet论文简要介绍

1.论文主要方法以及实现目的

        论文中提出了一种端到端的车道线检测算法,包括LaneNet和H-Net两个网络模型。其中,LaneNet是一种将语义分割和对像素进行向量表示结合起来的多任务模型,目的是将不同车道线实例化;H-Net是由卷积层和全连接层组成的网络模型,负责预测转换矩阵H,目的是对属于同一车道线的像素点进行回归。下图为论文中提到的网络结构,下文将具体介绍每一部分实现的功能。

1.1 LaneNet网络的框架结构

         下图为LaneNet的网络结构,是有两个解码分支的多任务模型,Segmentation branch 负责对输入图像进行语义分割(对像素进行二分类,判断像素属于车道线还是背景);Embedding branch 对像素进行嵌入式表示,论文将嵌入式向量的维度设置为4,训练得到的 embedding 向量用于聚类。最后将两个分支的结果进行结合利用 Mean-Shift 算法进行聚类,得到实例分割的结果。

 1.2 H-Net网络结构

LaneNet的输出是每条车道线的像素集合,还需要根据这些像素点回归出一条车道线。传统的做法是将图片投影到 bird’s-eye view 中,然后使用 2 阶或者 3 阶多项式进行拟合。在这种方法中,变换矩阵 H 只被计算一次,所有的图片使用的是相同的变换矩阵,这会导致地平面(山地,丘陵)变化下的误差。

         为了解决这个问题,论文训练了一个可以预测变换矩阵 H 的神经网络 H-Net,网络的输入是图片,输出是变换矩阵 H:

由上式可以看出,转置矩阵 H 只有6个参数,因此H-Net的输出是一个 6 维的向量。H-Net 由 6 层普通卷积网络和一层全连接网络构成,其网络结构如图所示:

1.3曲线拟合

Curve fitting的过程就是通过坐标 y 去重新预测坐标 x 的过程:

  •  对于包含 N 个像素点的车道线,每个像素点, 首先使用 
    H-Net 的预测输出 H 对其进行坐标变换:P′=HP
  • 随后使用 最小二乘法对 3d 多项式的参数进行拟合:

                                    

  •  根据拟合出的参数  预测出

                                  

  •  最后将 投影回去: 

                                     

注:

LaneNet论文讲解的文章有很多,在这里主要对LaneNet的论文的核心内容进行描述。

想对原文有更多了解的读者可参考:

https://www.jianshu.com/p/c6d38d648509

https://www.cnblogs.com/xuanyuyt/p/11523192.html

二、基于tensorflow2.0-keras的LaneNet网络实现

         本节主要讲论文中没有提到的实验细节,主要对LaneNet网络中的LaneNet分支进行复现(思路复现),tensorflow_keras实现内容包括数据集处理、数据预处理、网络搭建、损失函数、学习率调整机制。网络的基础框架没有沿用论文中提到的Enet,复现时用到的基础网络是Unet编解码器网络。

 注:文中提到的代码不能直接使用,需根据自己的需求进行简单修改

2.1数据集处理

        论文中提到的数据集为TuSimple数据集,此数据集包含三个文件train_set.zip、 test_set.zip 、test_label.json,tusimple数据集是点标注数据集,为保证之后的模型训练,在将数据投入网络之前需要对数据集进行处理。

文件格式为:

 

2.1.1将点数据转换为线数据

  1. import cv2
  2. import json
  3. import numpy as np
  4. import os
  5. base_path = "F:\\train_set\\"
  6. file = open(base_path + 'label_data_0313.json', 'r')
  7. image_num=0
  8. for line in file.readlines():
  9. data=json.loads(line)
  10. image = cv2.imread(os.path.join(base_path, data['raw_file']))
  11. #image=cv2.imread(data['raw_file'])
  12. binaryimage=np.zeros((image.shape[0],image.shape[1],1),np.uint8)
  13. instanceimage=binaryimage.copy()
  14. arr_width=data['lanes']
  15. arr_height=data['h_samples']
  16. width_num=len(arr_width)
  17. height_num=len(arr_height)
  18. for i in range(height_num):
  19. lane_hist=0
  20. for j in range(width_num):
  21. if arr_width[j][i-1]>0 and arr_width[j][i]>0:
  22. binaryimage[int(arr_height[i]),int(arr_width[j][i])]=255
  23. instanceimage[int(arr_height[i]),int(arr_width[j][i])]=lane_hist
  24. if i>0:
  25. cv2.line(binaryimage, (int(arr_width[j][i-1]),int(arr_height[i-1])), (int(arr_width[j][i]),int(arr_height[i])), 255, 10)
  26. cv2.line(instanceimage,(int(arr_width[j][i-1]),int(arr_height[i-1])), (int(arr_width[j][i]),int(arr_height[i])), lane_hist, 10)
  27. lane_hist+=50
  28. string1="H:\\unet-keras\\TUsimple\\label_instance\\"+data['raw_file'][6:]+str(image_num)+".png"
  29. string2="F:\\train_set\\png1\\"+str(image_num)+".png"
  30. string3="F:\\train_set\\image\\"+str(image_num)+".jpg"
  31. cv2.imwrite(string1,binaryimage)
  32. cv2.imwrite(string2,instanceimage)
  33. cv2.imwrite(string3,image)
  34. image_num=image_num+1
  35. #if image_num==500:
  36. #break
  37. file.close()

2.1.2构建数据路径文件

        训练时需要通过txt文件提取网络训练数据,根据网络实现具体目的构建txt,由于网络的结构时单输入,两输出,因此三个图片途径为一条数据,分别是真实地面、二值分割标签和车道线实例标签。

 生成训练数据文件的代码

  1. import os
  2. import re
  3. base_path = "H:/unet-keras/TUsimple\Data/"
  4. img_path= "H:/unet-keras/TUsimple/Data/image"
  5. label_path='H:/unet-keras/TUsimple/Data/label_instance'
  6. seg_path='H:/unet-keras/TUsimple/Data/label_binary'
  7. with open(base_path + 'train_seg_instance.txt', 'w') as f:
  8. images = os.listdir(img_path)
  9. labels = os.listdir(label_path)
  10. seg_labels=os.listdir(seg_path)
  11. for dir in images: #0313-1 0313-2
  12. #image
  13. dir1=img_path+'/'+dir
  14. Dir= os.listdir(dir1)
  15. Dir.sort(key=lambda i: int(re.match(r'(\d+)', i).group()))
  16. #label
  17. label_dir1 = label_path + '/' + dir
  18. #print(label_dir1) #H:\unet-keras\TUsimple\Data\label_instance\0313-1
  19. label_dir = os.listdir(label_dir1)
  20. label_dir.sort(key=lambda i: int(re.match(r'(\d+)', i).group()))
  21. #print(label_dir)
  22. seg_label_dir1 = seg_path + '/' + dir
  23. # print(label_dir1) #H:\unet-keras\TUsimple\Data\label_instance\0313-1
  24. seg_label_dir = os.listdir(seg_label_dir1)
  25. seg_label_dir.sort(key=lambda i: int(re.match(r'(\d+)', i).group()))
  26. for filenames in Dir: #60.... 20...
  27. #print(filenames)
  28. filenames1=dir1+'/'+filenames
  29. #print(filenames1)
  30. Filenames=os.listdir(filenames1)
  31. Filenames.sort(key=lambda i: int(re.match(r'(\d+)', i).group()))
  32. #print(filenames)
  33. #print(Filenames[15:])
  34. a=[]
  35. for filename in Filenames[15:]:
  36. #print(filename)
  37. out_path=filenames1+'/'+filename #H:\unet-keras\TUsimple\Data\image\0313-1\60\1.jpg
  38. a.append(out_path)
  39. #print(out_path)
  40. label_out_path=label_dir1+'/'+filenames+'.png'
  41. seg_label_out_path=seg_label_dir1+'/'+filenames+'.png'
  42. #print(label_out_path)
  43. a.append(label_out_path)
  44. a.append(seg_label_out_path)
  45. path=' '.join(a)
  46. #print(path)
  47. #print(a)
  48. f.write(path+'\n')

 可将数据集按8:2划分为训练集和测试集:

  1. import numpy.random
  2. '''划分数据集'''
  3. n=0
  4. from random import shuffle
  5. with open(r"H:\unet-keras\TUsimple\Data\train_seg_instance.txt","r", encoding='UTF-8') as f:
  6. with open(r"H:\unet-keras\TUsimple\Data\train_instance_tusimple0.8.txt", "w", encoding='UTF-8') as f1:
  7. with open(r"H:\unet-keras\TUsimple\Data\train_instance_tusimple0.2.txt", "w",
  8. encoding='UTF-8') as f2:
  9. if n == 0:
  10. train = f.readlines()
  11. shuffle(train)
  12. split=0.8
  13. num = len(train)
  14. train_num = int(num * split)
  15. # for j in train:
  16. # print(j)
  17. #print(len(train))
  18. for line in train[:train_num]:
  19. f1.writelines(line)
  20. for line in train[train_num:]:
  21. f2.writelines(line)
  22. n = n + 1

 2.2数据预处理_图片边框处理

        数据预处理为网络的输入做准备工作,本文中用到Unet网络做编解码器网络,输入尺寸为
(256,256,3)即(H,W,3),但是原图为256*128*3,为了使图片不失真,这里将图片进行白条边框处理,代码如下

  1. def letterbox_image(self,image, label, instance_label,size):
  2. label = Image.fromarray(np.array(label))
  3. instance_label = Image.fromarray(np.array(instance_label))
  4. '''resize image with unchanged aspect ratio using padding'''
  5. iw, ih = image.size
  6. w, h = size
  7. scale = min(w / iw, h / ih)
  8. nw = int(iw * scale)
  9. nh = int(ih * scale)
  10. image = image.resize((nw, nh), Image.BICUBIC)
  11. new_image = Image.new('RGB', size, (252, 252, 252))
  12. new_image.paste(image, ((w - nw) // 2, (h - nh) // 2))
  13. label = label.resize((nw, nh), Image.NEAREST)
  14. instance_label = instance_label.resize((nw, nh), Image.NEAREST)
  15. new_label = Image.new('L', size, (252))
  16. new_label.paste(label, ((w - nw) // 2, (h - nh) // 2))
  17. new_instance_label = Image.new('L', size, (252))
  18. new_instance_label.paste(instance_label, ((w - nw) // 2, (h - nh) // 2))
  19. return new_image, new_label, new_instance_label
'
运行

 

 左边为地面真实图像,右边为二值分割图像标签,即lanenet的输入图像(Input)和二值分割分支的输出标签(y_true),解释一下这里也对实例分割标签进行了白条调整,但是嵌入分支网络的输出的不是图像,而是每个像素点的多向量表示矩阵。

2.3标签处理

        在上边的白条处理部分可以看出,对不同的训练任务,标签处理的方式有很大的差别。二值分割分支标签处理用one_hot处理,例如:[0,1]代表背景像素,[1,0]代表车道线像素,可以看出非0即1,这样表示的好处是各类别之间不存在任何的相关性。

  1. png = np.array(png)
  2. # 转化成one_hot的形式
  3. seg_labels = np.zeros_like(png)
  4. seg_labels[png <= 127.5] = 1
  5. seg_labels = np.eye(self.num_classes + 1)[seg_labels.reshape([-1])]
  6. #把白条部分也加上,在之后的损失函数计算中会将其去除
  7. seg_labels = seg_labels.reshape((int(self.image_size[1]), int(self.image_size[0]), 2 + 1))

         嵌入分支主要目的是用多维向量来表示每一个像素点,能够体现各像素点之间的关系,方便之后对像素点聚类,具体实现在损失函数部分会提到,这里只对该分支的标签处理进行解释,对于需要表示各类别之间关系的数据,one_hot编码就不适用了,我们将用到标签编码,例[1,2,3,4],图片像素中所有为1的数值作为一类,2,3,4也分别表示一类,这样表示的好处在于能够直接讲同一类的像素点聚为一个类别。
        有了这样的想法,我们就要对数据进行特殊的处理,数据集处理部分的代码中,将每隔50像素值作为一个车道线种类,对此可设置标签编码

  1. instance_png=np.array(instance_png)
  2. instance_label = np.zeros_like(instance_png)
  3. instance_label[instance_png < 255] = 4
  4. instance_label[instance_png == 200] = 3
  5. instance_label[instance_png == 150] = 2
  6. instance_label[instance_png == 100] = 1
  7. instance_label[instance_png == 50] = 0

 2.4网络结构搭建

        LaneNet网络是多任务网络,基于此本文用到Unet网络为基础网络,单输入(真实路面图像256,256,3),两输出(二值分割图像(256,256,2)和嵌入式矩阵(256,256,5)),了解了网络组成,就很容易搭建了:
 

  1. from tensorflow.keras.layers import *
  2. from tensorflow.keras.models import Model
  3. import tensorflow.keras.backend as K
  4. class UNET_double(object):
  5. def __init__(self, input_shape, category_classes, color_classes):
  6. # default input_shape = (width, height, channel)
  7. self.input_shape = input_shape
  8. self.category_classes = category_classes
  9. self.color_classes = color_classes
  10. def vgg(self, inputs):
  11. # Conv->ReLU->BN->Pool
  12. bn_axis = 3
  13. x = Conv2D(64, (3, 3), padding='same')(inputs)
  14. x = BatchNormalization(axis=bn_axis)(x)
  15. x = Activation('relu')(x)
  16. x = Conv2D(64, (3, 3), padding='same')(x)
  17. x = BatchNormalization(axis=bn_axis)(x)
  18. x = Activation('relu')(x)
  19. feat1 = x
  20. x = MaxPooling2D((2, 2), strides=(2, 2))(x)
  21. # Block 2
  22. x = Conv2D(128, (3, 3), padding='same')(x)
  23. x = BatchNormalization(axis=bn_axis)(x)
  24. x = Activation('relu')(x)
  25. x = Conv2D(128, (3, 3), padding='same')(x)
  26. x = BatchNormalization(axis=bn_axis)(x)
  27. x = Activation('relu')(x)
  28. feat2 = x
  29. x = MaxPooling2D((2, 2), strides=(2, 2))(x)
  30. # Block 3
  31. x = Conv2D(256, (3, 3), padding='same')(x)
  32. x = BatchNormalization(axis=bn_axis)(x)
  33. x = Activation('relu')(x)
  34. x = Conv2D(256, (3, 3), padding='same')(x)
  35. x = BatchNormalization(axis=bn_axis)(x)
  36. x = Activation('relu')(x)
  37. x = Conv2D(256, (3, 3), padding='same')(x)
  38. x = BatchNormalization(axis=bn_axis)(x)
  39. x = Activation('relu')(x)
  40. feat3 = x
  41. x = MaxPooling2D((2, 2), strides=(2, 2))(x)
  42. # Block 4
  43. x = Conv2D(512, (3, 3), padding='same')(x)
  44. x = BatchNormalization(axis=bn_axis)(x)
  45. x = Activation('relu')(x)
  46. x = Conv2D(512, (3, 3), padding='same')(x)
  47. x = BatchNormalization(axis=bn_axis)(x)
  48. x = Activation('relu')(x)
  49. x = Conv2D(512, (3, 3), padding='same')(x)
  50. x = BatchNormalization(axis=bn_axis)(x)
  51. x = Activation('relu')(x)
  52. feat4 = x
  53. x = MaxPooling2D((2, 2), strides=(2, 2))(x)
  54. # Block 5
  55. x = Conv2D(512, (3, 3), padding='same')(x)
  56. x = BatchNormalization(axis=bn_axis)(x)
  57. x = Activation('relu')(x)
  58. x = Conv2D(512, (3, 3), padding='same')(x)
  59. x = BatchNormalization(axis=bn_axis)(x)
  60. x = Activation('relu')(x)
  61. x = Conv2D(512, (3, 3), padding='same')(x)
  62. x = BatchNormalization(axis=bn_axis)(x)
  63. x = Activation('relu')(x)
  64. return x,feat1,feat2,feat3,feat4
  65. def binery(self, inputs):
  66. x, feat1, feat2, feat3, feat4 = inputs
  67. channels = [64, 128, 256, 512]
  68. # Conv->ReLU->BN->Pool
  69. P5_up = UpSampling2D(size=(2, 2))(x)
  70. P4 = Concatenate(axis=3)([feat4, P5_up]) # 28,28,512 28,28,512
  71. P4 = Conv2D(channels[3], 3, padding='same', kernel_initializer='he_normal')(P4)
  72. P4 = Activation('relu')(P4)
  73. P4 = BatchNormalization()(P4)
  74. P4 = Conv2D(channels[3], 3, padding='same', kernel_initializer='he_normal')(P4)
  75. P4 = Activation('relu')(P4)
  76. P4 = BatchNormalization()(P4)
  77. P4_up = UpSampling2D(size=(2, 2))(P4) # 56,56,512
  78. P3 = Concatenate(axis=3)([feat3, P4_up])
  79. P3 = Conv2D(channels[2], 3, padding='same', kernel_initializer='he_normal')(P3)
  80. P3 = Activation('relu')(P3)
  81. P3 = BatchNormalization()(P3)
  82. P3 = Conv2D(channels[2], 3, padding='same', kernel_initializer='he_normal')(P3)
  83. P3 = Activation('relu')(P3)
  84. P3 = BatchNormalization()(P3)
  85. P3_up = UpSampling2D(size=(2, 2))(P3)
  86. P2 = Concatenate(axis=3)([feat2, P3_up])
  87. P2 = Conv2D(channels[1], 3, padding='same', kernel_initializer='he_normal')(P2)
  88. P2 = Activation('relu')(P2)
  89. P2 = BatchNormalization()(P2)
  90. P2 = Conv2D(channels[1], 3, padding='same', kernel_initializer='he_normal')(P2)
  91. P2 = Activation('relu')(P2)
  92. P2 = BatchNormalization()(P2)
  93. P2_up = UpSampling2D(size=(2, 2))(P2)
  94. P1 = Concatenate(axis=3)([feat1, P2_up])
  95. P1 = Conv2D(channels[0], 3, padding='same', kernel_initializer='he_normal')(P1)
  96. P1 = Activation('relu')(P1)
  97. P1 = BatchNormalization()(P1)
  98. P1 = Conv2D(channels[0], 3, padding='same', kernel_initializer='he_normal')(P1)
  99. P1 = Activation('relu')(P1)
  100. P1 = BatchNormalization()(P1)
  101. # softmax classifier
  102. out1 = Conv2D(self.category_classes, 1, activation="softmax",name='category_output')(P1)
  103. return out1
  104. def instance(self, inputs):
  105. # Conv->ReLU->BN->Pool
  106. x, feat1, feat2, feat3, feat4=inputs
  107. channels = [64, 128, 256, 512]
  108. P5_up = UpSampling2D(size=(2, 2))(x)
  109. P4 = Concatenate(axis=3)([feat4, P5_up]) # 28,28,512 28,28,512
  110. P4 = Conv2D(channels[3], 3, padding='same', kernel_initializer='he_normal')(P4)
  111. P4 = Activation('relu')(P4)
  112. P4 = BatchNormalization()(P4)
  113. P4 = Conv2D(channels[3], 3, padding='same', kernel_initializer='he_normal')(P4)
  114. P4 = Activation('relu')(P4)
  115. P4 = BatchNormalization()(P4)
  116. P4_up = UpSampling2D(size=(2, 2))(P4) # 56,56,512
  117. P3 = Concatenate(axis=3)([feat3, P4_up])
  118. P3 = Conv2D(channels[2], 3, padding='same', kernel_initializer='he_normal')(P3)
  119. P3 = Activation('relu')(P3)
  120. P3 = BatchNormalization()(P3)
  121. P3 = Conv2D(channels[2], 3, padding='same', kernel_initializer='he_normal')(P3)
  122. P3 = Activation('relu')(P3)
  123. P3 = BatchNormalization()(P3)
  124. P3_up = UpSampling2D(size=(2, 2))(P3)
  125. P2 = Concatenate(axis=3)([feat2, P3_up])
  126. P2 = Conv2D(channels[1], 3, padding='same', kernel_initializer='he_normal')(P2)
  127. P2 = Activation('relu')(P2)
  128. P2 = BatchNormalization()(P2)
  129. P2 = Conv2D(channels[1], 3, padding='same', kernel_initializer='he_normal')(P2)
  130. P2 = Activation('relu')(P2)
  131. P2 = BatchNormalization()(P2)
  132. P2_up = UpSampling2D(size=(2, 2))(P2)
  133. P1 = Concatenate(axis=3)([feat1, P2_up])
  134. P1 = Conv2D(channels[0], 3, padding='same', kernel_initializer='he_normal')(P1)
  135. P1 = Activation('relu')(P1)
  136. P1 = BatchNormalization()(P1)
  137. P1 = Conv2D(channels[0], 3, padding='same', kernel_initializer='he_normal')(P1)
  138. P1 = Activation('relu')(P1)
  139. P1 = BatchNormalization()(P1)
  140. out2 = Conv2D(self.color_classes, 1,name='color_output')(P1) #256,256,5
  141. return out2
  142. def build_model(self):
  143. input_shape = self.input_shape
  144. inputs = Input(shape=input_shape)
  145. vgg = self.vgg(inputs)
  146. binery=self.binery(vgg)
  147. instance=self.instance(vgg)
  148. model = Model(inputs=inputs, outputs=[binery,instance])
  149. #model.summary()
  150. return model
  151. #UNET_double(input_shape=(256,256,3),category_classes=2,color_classes=5).build_model()

 2.5损失函数

        两个任务需要构建两个损失函数,本文中二值分割分支用到的损失函数是dice_loss损失函数,车道线和背景属于极度不均衡的样本分类,此函数能够很好的解决这样的问题并且不需要设置任何权重超参数。

  1. def dice_loss(beta=1, smooth=1e-5):
  2. def _dice_loss_with_CE(y_true, y_pred):
  3. #print(y_pred) #2,256,256,2
  4. #print(y_true) #2,256,256,3
  5. y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
  6. CE_loss = - y_true[..., :-1] * K.log(y_pred)
  7. CE_loss = K.mean(K.sum(CE_loss, axis=-1))
  8. tp = K.sum(y_true[..., :-1] * y_pred, axis=[0, 1, 2])
  9. fp = K.sum(y_pred, axis=[0, 1, 2]) - tp
  10. fn = K.sum(y_true[..., :-1], axis=[0, 1, 2]) - tp
  11. score = ((1 + beta ** 2) * tp + smooth) / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth)
  12. score = tf.reduce_mean(score)
  13. dice_loss = 1 - score
  14. return CE_loss + dice_loss
  15. return _dice_loss

        像素嵌入分支的损失函数保留论文中的损失函数:

 为了区分车道线上的像素属于哪条车道,embedding_branch为每个像素初始化一个embedding向量,并且在设计loss时,使得属于同一条车道线的像素向量距离很小,属于不同车道线的像素向量距离很大。
方差loss(L_var) :当像素向量(pixel embedding)x_i与对应车道线均值向量μ_c的距离大于δ_v时,模型会进行更新,使得x_i 靠近μ_c ;
距离loss(L_dist) :当不同车道线均值向量 μ_ca和μ_cb之间的距离小于δ_d 时,模型会进行更新,使得μ_ca与μ_cb远离彼此;
方差loss(L_var)使得像素向量向车道线的均值向量 μ_c 靠近,距离loss(L_dist)则会推动聚类中心远离彼此。

  1. tf.config.experimental_run_functions_eagerly(True)#调试时用于将静态图转换为动态图
  2. def discriminative_loss_single(
  3. prediction,
  4. correct_label,
  5. feature_dim,
  6. label_shape,
  7. delta_v,
  8. delta_d,
  9. param_var,
  10. param_dist,
  11. param_reg):
  12. """
  13. 鉴别损失
  14. :参数预测:网络推理
  15. :param correct_label:实例标签
  16. :param feature_dim:预测的特征维度
  17. :param label_shape:标签的形状
  18. :param delta_v:截止方差距离 0.5, 3.0, 1.0, 1.0, 0.001
  19. :param delta_d:截止簇距离
  20. :param param_var:簇内方差的权重
  21. :param param_dist:簇间距离的权重
  22. :param param_reg:权重正则化
  23. """
  24. correct_label = tf.reshape(
  25. correct_label, [label_shape[1] * label_shape[0]] #可认为是网络的输入形状(h*w)
  26. )
  27. reshaped_pred = tf.reshape(
  28. prediction, [label_shape[1] * label_shape[0], feature_dim] #可认为是网络的输出形状(h*w,特征维度)
  29. )
  30. # calculate instance nums 计算实例的个数
  31. unique_labels, unique_id, counts = tf.unique_with_counts(correct_label)
  32. #unique_with_counts函数
  33. # tensor 'x' is [1, 1, 2, 4, 4, 4, 7, 8, 8]
  34. # y, idx, count = unique_with_counts(x)
  35. # y == > [1, 2, 4, 7, 8]
  36. # idx == > [0, 0, 1, 2, 2, 2, 3, 4, 4]
  37. # count == > [2, 1, 3, 1, 2]
  38. #以上返回的都是tensor
  39. counts = tf.cast(counts, tf.float32) #将counets的张量形式转换维float32 counts有几个元素代表几类,元素值代表那一类的数量
  40. num_instances = tf.size(unique_labels) #返回unique_labels的形状
  41. # calculate instance pixel embedding mean vec计算实例像素嵌入平均向量
  42. segmented_sum = tf.math.unsorted_segment_sum(
  43. reshaped_pred, unique_id, num_instances) #将相同类别的像素加在一起
  44. #a = [[1 2 3],[4 5 6],[7 8 9]]
  45. #tf.unsorted_segment_sum(data=a,segment_ids=[0,1,0],num_segments=2))
  46. #输出为#[[ 8 10 12],[ 4 5 6]
  47. mu = tf.divide(segmented_sum, tf.reshape(counts, (-1, 1))) #segmented_sum里的每一个元素除以依次counets中的每个元素
  48. mu_expand = tf.gather(mu, unique_id) #根据unique_id将mu进行扩充
  49. distance = tf.norm(tf.subtract(mu_expand, reshaped_pred), axis=1, ord=1)
  50. distance = tf.subtract(distance, delta_v)
  51. distance = tf.clip_by_value(distance, 0., distance)
  52. distance = tf.square(distance)
  53. l_var = tf.math.unsorted_segment_sum(distance, unique_id, num_instances)
  54. l_var = tf.divide(l_var, counts)
  55. l_var = tf.reduce_sum(l_var)
  56. l_var = tf.divide(l_var, tf.cast(num_instances, tf.float32))
  57. mu_interleaved_rep = tf.tile(mu, [num_instances, 1])
  58. mu_band_rep = tf.tile(mu, [1, num_instances])
  59. mu_band_rep = tf.reshape(
  60. mu_band_rep,
  61. (num_instances *
  62. num_instances,
  63. feature_dim))
  64. mu_diff = tf.subtract(mu_band_rep, mu_interleaved_rep)
  65. intermediate_tensor = tf.reduce_sum(tf.abs(mu_diff), axis=1)
  66. zero_vector = tf.zeros(1, dtype=tf.float32)
  67. bool_mask = tf.not_equal(intermediate_tensor, zero_vector)
  68. mu_diff_bool = tf.boolean_mask(mu_diff, bool_mask)
  69. mu_norm = tf.norm(mu_diff_bool, axis=1, ord=1)
  70. mu_norm = tf.subtract(2. * delta_d, mu_norm)
  71. mu_norm = tf.clip_by_value(mu_norm, 0., mu_norm)
  72. mu_norm = tf.square(mu_norm)
  73. l_dist = tf.reduce_mean(mu_norm)
  74. l_reg = tf.reduce_mean(tf.norm(mu, axis=1, ord=1))
  75. param_scale = 1.
  76. l_var = param_var * l_var
  77. l_dist = param_dist * l_dist
  78. l_reg = param_reg * l_reg
  79. loss = param_scale * (l_var + l_dist + l_reg)
  80. return loss, l_var, l_dist, l_reg
  81. def instance_loss(feature_dim=5, image_shape=(256,256),
  82. delta_v=0.5, delta_d=3.0, param_var=1.0, param_dist=1.0, param_reg=0.001):
  83. """
  84. 0.5, 3.0, 1.0, 1.0, 0.001
  85. :return: discriminative loss and its three components 判别损失及其三个分量
  86. """
  87. def loss(y_true,y_pred):
  88. def cond(label, batch, out_loss, out_var, out_dist, out_reg, i):
  89. return tf.less(i, tf.shape(batch)[0])
  90. def body(label, batch, out_loss, out_var, out_dist, out_reg, i):
  91. disc_loss, l_var, l_dist, l_reg = discriminative_loss_single(
  92. y_pred[i], y_true[i], feature_dim, image_shape, delta_v, delta_d, param_var, param_dist, param_reg)
  93. out_loss = out_loss.write(i, disc_loss)
  94. out_var = out_var.write(i, l_var)
  95. out_dist = out_dist.write(i, l_dist)
  96. out_reg = out_reg.write(i, l_reg)
  97. return label, batch, out_loss, out_var, out_dist, out_reg, i + 1
  98. # TensorArray is a data structure that support dynamic writing
  99. output_ta_loss = tf.TensorArray(
  100. dtype=tf.float32, size=0, dynamic_size=True)
  101. output_ta_var = tf.TensorArray(
  102. dtype=tf.float32, size=0, dynamic_size=True)
  103. output_ta_dist = tf.TensorArray(
  104. dtype=tf.float32, size=0, dynamic_size=True)
  105. output_ta_reg = tf.TensorArray(
  106. dtype=tf.float32, size=0, dynamic_size=True)
  107. _, _, out_loss_op, out_var_op, out_dist_op, out_reg_op, _ = tf.while_loop(
  108. cond, body, [
  109. y_true, y_pred, output_ta_loss, output_ta_var, output_ta_dist, output_ta_reg, 0])
  110. out_loss_op = out_loss_op.stack()
  111. out_var_op = out_var_op.stack()
  112. out_dist_op = out_dist_op.stack()
  113. out_reg_op = out_reg_op.stack()
  114. dice_loss = tf.reduce_mean(out_loss_op)
  115. l_var = tf.reduce_mean(out_var_op)
  116. l_dist = tf.reduce_mean(out_dist_op)
  117. l_reg = tf.reduce_mean(out_reg_op)
  118. return dice_loss, l_var, l_dist, l_reg
  119. return loss

 因为有背景像素的存在,此问题也是不均衡分类问题,可以看出以上代码中也用到了dice_loss。

2.6网络学习率下降策略

本文用到了两种学习率下降策略,
阶段性下降:

  1. reduce_lr = ReduceLROnPlateau(monitor='category_output_loss', factor=0.5, patience=3, verbose=1)
  2. # 是否需要早停,当val_loss一直不下降的时候意味着模型基本训练完毕,可以停止
  3. early_stopping = EarlyStopping(monitor='category_output_loss', min_delta=0, patience=10, verbose=1)

余弦退火学习率下降:

  1. def cosine_decay_with_warmup(global_step,
  2. learning_rate_base,
  3. total_steps,
  4. warmup_learning_rate=0.0,
  5. warmup_steps=0,
  6. hold_base_rate_steps=0,
  7. min_learn_rate=0,
  8. ):
  9. """
  10. 参数:
  11. global_step: 上面定义的Tcur,记录当前执行的步数。
  12. learning_rate_base:预先设置的学习率,当warm_up阶段学习率增加到learning_rate_base,就开始学习率下降。
  13. total_steps: 是总的训练的步数,等于epoch*sample_count/batch_size,(sample_count是样本总数,epoch是总的循环次数)
  14. warmup_learning_rate: 这是warm up阶段线性增长的初始值
  15. warmup_steps: warm_up总的需要持续的步数
  16. hold_base_rate_steps: 这是可选的参数,即当warm up阶段结束后保持学习率不变,知道hold_base_rate_steps结束后才开始学习率下降
  17. """
  18. if total_steps < warmup_steps:
  19. raise ValueError('total_steps must be larger or equal to '
  20. 'warmup_steps.')
  21. #这里实现了余弦退火的原理,设置学习率的最小值为0,所以简化了表达式
  22. learning_rate = 0.5 * learning_rate_base * (1 + np.cos(np.pi *
  23. (global_step - warmup_steps - hold_base_rate_steps) / float(total_steps - warmup_steps - hold_base_rate_steps)))
  24. #如果hold_base_rate_steps大于0,表明在warm up结束后学习率在一定步数内保持不变
  25. if hold_base_rate_steps > 0:
  26. learning_rate = np.where(global_step > warmup_steps + hold_base_rate_steps,
  27. learning_rate, learning_rate_base)
  28. if warmup_steps > 0:
  29. if learning_rate_base < warmup_learning_rate:
  30. raise ValueError('learning_rate_base must be larger or equal to '
  31. 'warmup_learning_rate.')
  32. #线性增长的实现
  33. slope = (learning_rate_base - warmup_learning_rate) / warmup_steps
  34. warmup_rate = slope * global_step + warmup_learning_rate
  35. #只有当global_step 仍然处于warm up阶段才会使用线性增长的学习率warmup_rate,否则使用余弦退火的学习率learning_rate
  36. learning_rate = np.where(global_step < warmup_steps, warmup_rate,
  37. learning_rate)
  38. learning_rate = max(learning_rate,min_learn_rate)
  39. return learning_rate
'
运行
  1. class WarmUpCosineDecayScheduler(keras.callbacks.Callback):
  2. """
  3. 继承Callback,实现对学习率的调度
  4. """
  5. def __init__(self,
  6. learning_rate_base,
  7. total_steps,
  8. global_step_init=0,
  9. warmup_learning_rate=0.0,
  10. warmup_steps=0,
  11. hold_base_rate_steps=0,
  12. min_learn_rate=0,
  13. # interval_epoch代表余弦退火之间的最低点
  14. interval_epoch=[1],
  15. verbose=0):
  16. super(WarmUpCosineDecayScheduler, self).__init__()
  17. # 基础的学习率
  18. self.learning_rate_base = learning_rate_base
  19. # 热调整参数
  20. self.warmup_learning_rate = warmup_learning_rate
  21. # 参数显示
  22. self.verbose = verbose
  23. # learning_rates用于记录每次更新后的学习率,方便图形化观察
  24. self.min_learn_rate = min_learn_rate
  25. self.learning_rates = []
  26. self.interval_epoch = interval_epoch
  27. # 贯穿全局的步长
  28. self.global_step_for_interval = global_step_init
  29. # 用于上升的总步长
  30. self.warmup_steps_for_interval = warmup_steps
  31. # 保持最高峰的总步长
  32. self.hold_steps_for_interval = hold_base_rate_steps
  33. # 整个训练的总步长
  34. self.total_steps_for_interval = total_steps
  35. self.interval_index = 0
  36. # 计算出来两个最低点的间隔
  37. self.interval_reset = [self.interval_epoch[0]]
  38. for i in range(len(self.interval_epoch)-1):
  39. self.interval_reset.append(self.interval_epoch[i+1]-self.interval_epoch[i])
  40. self.interval_reset.append(1-self.interval_epoch[-1])
  41. #更新global_step,并记录当前学习率
  42. def on_batch_end(self, batch, logs=None):
  43. self.global_step = self.global_step + 1
  44. self.global_step_for_interval = self.global_step_for_interval + 1
  45. lr = K.get_value(self.model.optimizer.lr)
  46. self.learning_rates.append(lr)
  47. #更新学习率
  48. def on_batch_begin(self, batch, logs=None):
  49. # 每到一次最低点就重新更新参数
  50. if self.global_step_for_interval in [0]+[int(i*self.total_steps_for_interval) for i in self.interval_epoch]:
  51. self.total_steps = self.total_steps_for_interval * self.interval_reset[self.interval_index]
  52. self.warmup_steps = self.warmup_steps_for_interval * self.interval_reset[self.interval_index]
  53. self.hold_base_rate_steps = self.hold_steps_for_interval * self.interval_reset[self.interval_index]
  54. self.global_step = 0
  55. self.interval_index += 1
  56. lr = cosine_decay_with_warmup(global_step=self.global_step,
  57. learning_rate_base=self.learning_rate_base,
  58. total_steps=self.total_steps,
  59. warmup_learning_rate=self.warmup_learning_rate,
  60. warmup_steps=self.warmup_steps,
  61. hold_base_rate_steps=self.hold_base_rate_steps,
  62. min_learn_rate = self.min_learn_rate)
  63. K.set_value(self.model.optimizer.lr, lr)
  64. if self.verbose > 0:
  65. print('\nBatch %05d: setting learning '
  66. 'rate to %s.' % (self.global_step + 1, lr))

2.7模型编译

  1. losses = {'category_output': dice_loss_with_CE(),
  2. 'color_output': instance_loss(feature_dim=5, image_shape=(256,256),delta_v=0.5, delta_d=3.0, param_var=1.0, param_dist=1.0, param_reg=0.001)}
  3. model.compile(loss=losses ,
  4. optimizer=Adam(lr=learning_rate_base),loss_weights=(0.5, 0.5),
  5. metrics=[f_score(),'acc',recall(),precision()])

2.8超参数设置,开始训练:

        学习率:0.01;轮次:100;批处理:16

  1. model.fit_generator(generator=gen,
  2. steps_per_epoch=max(1, len(train_lines) // batch_size),
  3. epochs=epochs,verbose=2,
  4. callbacks=[checkpoint_period, tensorboard, early_stopping,
  5. reduce_lr])

2.9后处理   

        训练结束后,获得的数据需要进行后处理才能得出最终的输出图,本文只将laneNet网络两分支的输出图进行后处理展示,H-Net部分没有复现(H-Net后处理过程为一个特征提取网络,输入为真是地面图像,输出为坐标转换矩阵用来处理上下坡车道线位置变化情况的)。

  1. import copy
  2. import matplotlib.pyplot as plt
  3. import numpy as np
  4. from PIL import Image
  5. from Model.model import UNET_double as lanenet
  6. from dataload import postprocess
  7. import os
  8. #os.environ["CUDA_VISIBLE_DEVICES"]="-1"
  9. def minmax_scale(input_arr):
  10. min_val = np.min(input_arr)
  11. max_val = np.max(input_arr)
  12. output_arr = (input_arr - min_val) * 255.0 / (max_val - min_val)
  13. return output_arr
  14. def letterbox_image(image, size):
  15. image = image.convert("RGB")
  16. iw, ih = image.size
  17. w, h = size
  18. scale = min(w / iw, h / ih)
  19. nw = int(iw * scale)
  20. nh = int(ih * scale)
  21. image = image.resize((nw, nh), Image.BICUBIC)
  22. new_image = Image.new('RGB', size, (128, 128, 128))
  23. new_image.paste(image, ((w - nw) // 2, (h - nh) // 2))
  24. return new_image, nw, nh
  25. def test_lanenet(image_path, weights_path,input_size=(256,256,3)):
  26. model = lanenet(input_shape=input_size,category_classes=2,color_classes=5).build_model()
  27. model.load_weights(weights_path)
  28. image = Image.open(image_path)
  29. old_img = copy.deepcopy(image)
  30. image_vis = old_img
  31. img, nw, nh = letterbox_image(image, (input_size[0],input_size[1]))
  32. image_vis1=img
  33. img = [np.array(img) / 255]
  34. image = np.asarray(img)
  35. binary_seg_image, instance_seg_image = model.predict(image)
  36. '''二值图后处理'''
  37. binary_seg_image = binary_seg_image.argmax(axis=-1).reshape([input_size[0], input_size[1]])
  38. # 去除灰条
  39. pr = binary_seg_image[int((input_size[0]- nh) // 2):int((input_size[0] - nh) // 2 + nh),
  40. int((input_size[1] - nw) // 2):int((input_size[1] - nw) // 2 + nw)]
  41. seg_img = np.zeros((np.shape(pr)[0], np.shape(pr)[1], 3))
  42. colors = [(255, 255, 255), (0, 0, 0)]
  43. for c in range(2):
  44. seg_img[:, :, 0] += ((pr[:, :] == c) * (colors[c][0])).astype('uint8')
  45. seg_img[:, :, 1] += ((pr[:, :] == c) * (colors[c][1])).astype('uint8')
  46. seg_img[:, :, 2] += ((pr[:, :] == c) * (colors[c][2])).astype('uint8')
  47. seg_img = np.array(Image.fromarray(np.uint8(seg_img)).resize((256, 128)))
  48. postprocessor = postprocess.LaneNetPostProcessor()
  49. postprocess_result = postprocessor.postprocess(
  50. binary_seg_result=binary_seg_image,
  51. instance_seg_result=instance_seg_image[0],
  52. source_image=image_vis1
  53. )
  54. mask_image = postprocess_result['mask_image']
  55. for i in range(5):
  56. instance_seg_image[0][:, :, i] = minmax_scale(instance_seg_image[0][:, :, i])
  57. embedding_image = np.array(instance_seg_image[0], np.uint8)
  58. #去灰条
  59. embedding_image = embedding_image[int((input_size[0] - nh) // 2):int((input_size[0] - nh) // 2 + nh),
  60. int((input_size[1] - nw) // 2):int((input_size[1] - nw) // 2 + nw)]
  61. plt.figure('mask_image')
  62. plt.imshow(mask_image)
  63. plt.figure('src_image')
  64. plt.imshow(image_vis)
  65. plt.figure('instance_image')
  66. plt.imshow(embedding_image[:, :, (3, 1, 0)])
  67. plt.figure('seg_image')
  68. plt.imshow(seg_img)
  69. plt.show()
  70. return
  71. if __name__ == '__main__':
  72. image_path = r'D:\dataset\trainset\image\origin\0601\1494452579506899721\9.jpg'
  73. weights_path = r'H:\unet-keras\多任务训练\aspp_convlstm_cos\lanenet_loss0.14.h5'
  74. test_lanenet(image_path, weights_path)

最终输出图:

 

总结

        我刚开始复现的时候是一头雾水,因为没有对论文中的内容进行细致理解,因此对一篇论文完成复现,首先明确论文的研究目的,熟悉整个论文框架,再对每一块的内容进行学习理解。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/人工智能uu/article/detail/841161
推荐阅读
相关标签
  

闽ICP备14008679号