当前位置:   article > 正文

YOLOV4-车道线检测-车距离预测_yolo车道线检测

yolo车道线检测

1.前言

        最近在看华为的CANN框架,发现了一些很有意思的开源算法(本文所有的代码都出自华为开源git发布的代码),华为最近出了AI PRO开发板,想着现在开发板上用用(不想重新配置环境了,麻烦还累),看着代码有onnx的模型,就先用onnx实现,后续可能推出rknn的推理吧,谁知道呢。具体细节也不想写,无脑用就行。

2.代码准备

        代码是一个视频处理的程序,主要功能是使用YOLOv4模型进行目标检测,并结合车道检测,最后输出处理后的视频。

2.1 主要步骤

  1. 导入必要的库:导入了一系列常用的计算机视觉库,如OpenCV、numpy等,以及自定义的LaneFinder模块。
  2. 定义了一些常量和全局变量:包括类别标签、模型输入输出的尺寸、类别数量、锚点等。
  3. 定义了预处理函数preprocess:将输入的帧图像进行缩放和填充,使其符合模型的输入尺寸,并进行归一化处理。
  4. 定义了一些辅助函数:如计算两个框的重叠区域、计算IoU、应用非极大值抑制(NMS)等。
  5. 定义了模型输出解码函数decode_bbox:将模型输出的特征图转换为检测框的坐标和类别概率。
  6. 定义了后处理函数post_process:根据模型输出的结果进行NMS处理,并将检测结果转换为可读的格式。
  7. 定义了一些辅助函数:包括将标签转换为可读格式、处理帧图像等。
  8. 主函数main:读取视频帧,调用前述函数进行目标检测和车道检测,最后将结果写入输出视频文件中。

 2.2 JSON配置

        文件包含了相机校准矩阵、畸变系数、透视变换矩阵以及其他参数。

  1. cam_matrix(相机矩阵):相机内参矩阵,是用来描述相机的内部参数的一个3x3矩阵。其中包括了相机的焦距(fx、fy)、主点(cx、cy)等信息。在这个配置中,焦距分别为1156.94047、1152.13881,主点坐标为(665.948814, 388.784788)。

  2. dist_coeffs(畸变系数):相机的畸变系数,通常由径向畸变系数和切向畸变系数构成。这里包含了五个系数,分别是[-0.237638057, -0.0854041989, -0.000790999421, -0.000115882426, 0.105726054]。

  3. perspective_transform(透视变换矩阵):透视变换矩阵,用于将图像转换到鸟瞰图(俯视图)。该矩阵是一个3x3的矩阵,其中包含了变换的缩放、旋转和平移信息。

  4. pixels_per_meter(每米对应的像素数):这个参数表示在鸟瞰图中,每米对应的像素数。在水平方向上为46.56770571051312像素/m,在垂直方向上为33.06512376601635像素/m。

  5. WARPED_SIZE(鸟瞰图尺寸):进行透视变换后的图像尺寸,宽度为500像素,高度为600像素。

  6. ORIGINAL_SIZE(原始图像尺寸):原始图像的尺寸,宽度为1280像素,高度为720像素。

2.3 车道线检测

        LaneFinder.py是一个用于检测车道线的算法。以下是代码中各个函数的功能:

  1. get_center_shift(coeffs, img_size, pixels_per_meter): 计算车道线中心的偏移量。
  2. get_curvature(coeffs, img_size, pixels_per_meter): 计算车道线的曲率。
  3. LaneLineFinder: 一个类,用于检测车道线中的单条车道线。
  4. LaneFinder: 一个类,用于检测整个车道。它包括了左右两条车道线的检测。
  5. undistort(img): 对图像进行畸变校正。
  6. warp(img): 对图像进行透视变换,使车道线在图像中呈现平行。
  7. unwarp(img): 对透视变换后的图像进行逆变换,使车道线回到原始视角。
  8. equalize_lines(alpha): 对检测到的左右车道线进行均衡处理,使它们保持一定的间隔。
  9. find_lane(img, distorted=True, reset=False): 在图像中寻找车道线,包括畸变校正、透视变换、颜色过滤和车道线检测等步骤。
  10. draw_lane_weighted(img, thickness=5, alpha=0.8, beta=1, gamma=0): 在原始图像上绘制检测到的车道线,并添加曲率和车辆位置信息。
  11. process_image(img, reset=False): 对输入的图像进行处理,并返回带有检测到的车道线的图像。
  12. set_img_size(img_size): 设置图像的大小。

        这些函数共同构成了一个车道线检测算法,可以在道路图像中准确地检测出车道线并估计车辆的位置和行驶曲率。

2.4 主函数代码

  1. import sys
  2. import os
  3. import json
  4. import numpy as np
  5. import cv2 as cv
  6. from PIL import Image
  7. import LaneFinder
  8. import onnxruntime as rt
  9. labels = ["person",
  10. "bicycle", "car", "motorbike", "aeroplane",
  11. "bus", "train", "truck", "boat", "traffic light",
  12. "fire hydrant", "stop sign", "parking meter", "bench",
  13. "bird", "cat", "dog", "horse", "sheep", "cow", "elephant",
  14. "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag",
  15. "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball",
  16. "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
  17. "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon",
  18. "bowl", "banana", "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog",
  19. "pizza", "donut", "cake", "chair", "sofa", "potted plant", "bed", "dining table",
  20. "toilet", "TV monitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
  21. "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase",
  22. "scissors", "teddy bear", "hair drier", "toothbrush"]
  23. OUTPUT_DIR = '../out/'
  24. MODEL_WIDTH = 608
  25. MODEL_HEIGHT = 608
  26. class_num = 80
  27. stride_list = [32, 16, 8]
  28. anchors_3 = np.array([[12, 16], [19, 36], [40, 28]]) / stride_list[2]
  29. anchors_2 = np.array([[36, 75], [76, 55], [72, 146]]) / stride_list[1]
  30. anchors_1 = np.array([[142, 110], [192, 243], [459, 401]]) / stride_list[0]
  31. anchor_list = [anchors_1, anchors_2, anchors_3]
  32. iou_threshold = 0.3
  33. colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (0, 255, 255), (255, 0, 255), (255, 255, 0)]
  34. def preprocess(frame):
  35. image = Image.fromarray(cv.cvtColor(frame, cv.COLOR_BGR2RGB))
  36. img_h = image.size[1]
  37. img_w = image.size[0]
  38. net_h = MODEL_HEIGHT
  39. net_w = MODEL_WIDTH
  40. scale = min(float(net_w) / float(img_w), float(net_h) / float(img_h))
  41. new_w = int(img_w * scale)
  42. new_h = int(img_h * scale)
  43. shift_x = (net_w - new_w) // 2
  44. shift_y = (net_h - new_h) // 2
  45. shift_x_ratio = (net_w - new_w) / 2.0 / net_w
  46. shift_y_ratio = (net_h - new_h) / 2.0 / net_h
  47. image_ = image.resize((new_w, new_h))
  48. new_image = np.zeros((net_h, net_w, 3), np.uint8)
  49. new_image[shift_y: new_h + shift_y, shift_x: new_w + shift_x, :] = np.array(image_)
  50. new_image = new_image.astype(np.float32)
  51. new_image = new_image / 255
  52. print('new_image.shape', new_image.shape)
  53. new_image = new_image.transpose(2, 0, 1).copy().reshape(1, 3, 608, 608)
  54. return new_image, image
  55. def overlap(x1, x2, x3, x4):
  56. left = max(x1, x3)
  57. right = min(x2, x4)
  58. return right - left
  59. def cal_iou(box, truth):
  60. w = overlap(box[0], box[2], truth[0], truth[2])
  61. h = overlap(box[1], box[3], truth[1], truth[3])
  62. if w <= 0 or h <= 0:
  63. return 0
  64. inter_area = w * h
  65. union_area = (box[2] - box[0]) * (box[3] - box[1]) + (truth[2] - truth[0]) * (truth[3] - truth[1]) - inter_area
  66. return inter_area * 1.0 / union_area
  67. def apply_nms(all_boxes, thres):
  68. res = []
  69. for cls in range(class_num):
  70. cls_bboxes = all_boxes[cls]
  71. sorted_boxes = sorted(cls_bboxes, key=lambda d: d[5])[::-1]
  72. p = dict()
  73. for i in range(len(sorted_boxes)):
  74. if i in p:
  75. continue
  76. truth = sorted_boxes[i]
  77. for j in range(i + 1, len(sorted_boxes)):
  78. if j in p:
  79. continue
  80. box = sorted_boxes[j]
  81. iou = cal_iou(box, truth)
  82. if iou >= thres:
  83. p[j] = 1
  84. for i in range(len(sorted_boxes)):
  85. if i not in p:
  86. res.append(sorted_boxes[i])
  87. return res
  88. def _sigmoid(x):
  89. return 1.0 / (1 + np.exp(-x))
  90. def decode_bbox(conv_output, anchors, img_w, img_h, x_scale, y_scale, shift_x_ratio, shift_y_ratio):
  91. print('conv_output.shape', conv_output.shape)
  92. _, _, h, w = conv_output.shape
  93. conv_output = conv_output.transpose(0, 2, 3, 1)
  94. pred = conv_output.reshape((h * w, 3, 5 + class_num))
  95. pred[..., 4:] = _sigmoid(pred[..., 4:])
  96. pred[..., 0] = (_sigmoid(pred[..., 0]) + np.tile(range(w), (3, h)).transpose((1, 0))) / w
  97. pred[..., 1] = (_sigmoid(pred[..., 1]) + np.tile(np.repeat(range(h), w), (3, 1)).transpose((1, 0))) / h
  98. pred[..., 2] = np.exp(pred[..., 2]) * anchors[:, 0:1].transpose((1, 0)) / w
  99. pred[..., 3] = np.exp(pred[..., 3]) * anchors[:, 1:2].transpose((1, 0)) / h
  100. bbox = np.zeros((h * w, 3, 4))
  101. bbox[..., 0] = np.maximum((pred[..., 0] - pred[..., 2] / 2.0 - shift_x_ratio) * x_scale * img_w, 0) # x_min
  102. bbox[..., 1] = np.maximum((pred[..., 1] - pred[..., 3] / 2.0 - shift_y_ratio) * y_scale * img_h, 0) # y_min
  103. bbox[..., 2] = np.minimum((pred[..., 0] + pred[..., 2] / 2.0 - shift_x_ratio) * x_scale * img_w, img_w) # x_max
  104. bbox[..., 3] = np.minimum((pred[..., 1] + pred[..., 3] / 2.0 - shift_y_ratio) * y_scale * img_h, img_h) # y_max
  105. pred[..., :4] = bbox
  106. pred = pred.reshape((-1, 5 + class_num))
  107. pred[:, 4] = pred[:, 4] * pred[:, 5:].max(1)
  108. pred[:, 5] = np.argmax(pred[:, 5:], axis=-1)
  109. pred = pred[pred[:, 4] >= 0.2]
  110. print('pred[:, 5]', pred[:, 5])
  111. print('pred[:, 5] shape', pred[:, 5].shape)
  112. all_boxes = [[] for ix in range(class_num)]
  113. for ix in range(pred.shape[0]):
  114. box = [int(pred[ix, iy]) for iy in range(4)]
  115. box.append(int(pred[ix, 5]))
  116. box.append(pred[ix, 4])
  117. all_boxes[box[4] - 1].append(box)
  118. return all_boxes
  119. def convert_labels(label_list):
  120. if isinstance(label_list, np.ndarray):
  121. label_list = label_list.tolist()
  122. label_names = [labels[int(index)] for index in label_list]
  123. return label_names
  124. def post_process(infer_output, origin_img):
  125. print("post process")
  126. result_return = dict()
  127. img_h = origin_img.size[1]
  128. img_w = origin_img.size[0]
  129. scale = min(float(MODEL_WIDTH) / float(img_w), float(MODEL_HEIGHT) / float(img_h))
  130. new_w = int(img_w * scale)
  131. new_h = int(img_h * scale)
  132. shift_x_ratio = (MODEL_WIDTH - new_w) / 2.0 / MODEL_WIDTH
  133. shift_y_ratio = (MODEL_HEIGHT - new_h) / 2.0 / MODEL_HEIGHT
  134. class_number = len(labels)
  135. num_channel = 3 * (class_number + 5)
  136. x_scale = MODEL_WIDTH / float(new_w)
  137. y_scale = MODEL_HEIGHT / float(new_h)
  138. all_boxes = [[] for ix in range(class_number)]
  139. # print(infer_output[0].shape)
  140. # print(infer_output[1].shape)
  141. # print(infer_output[2].shape)
  142. for ix in range(3):
  143. pred = infer_output[ix]
  144. print('pred.shape', pred.shape)
  145. anchors = anchor_list[ix]
  146. boxes = decode_bbox(pred, anchors, img_w, img_h, x_scale, y_scale, shift_x_ratio, shift_y_ratio)
  147. all_boxes = [all_boxes[iy] + boxes[iy] for iy in range(class_number)]
  148. print("all_box:", all_boxes)
  149. res = apply_nms(all_boxes, iou_threshold)
  150. print("res:", res)
  151. if not res:
  152. result_return['detection_classes'] = []
  153. result_return['detection_boxes'] = []
  154. result_return['detection_scores'] = []
  155. return result_return
  156. else:
  157. new_res = np.array(res)
  158. picked_boxes = new_res[:, 0:4]
  159. picked_boxes = picked_boxes[:, [1, 0, 3, 2]]
  160. picked_classes = convert_labels(new_res[:, 4])
  161. picked_score = new_res[:, 5]
  162. result_return['detection_classes'] = picked_classes
  163. result_return['detection_boxes'] = picked_boxes.tolist()
  164. result_return['detection_scores'] = picked_score.tolist()
  165. return result_return
  166. def preprocess_frame(bgr_img):
  167. bgr_img = bgr_img[:, :, ::-1]
  168. image = bgr_img
  169. image = LaneFinder.Image.fromarray(image.astype('uint8'), 'RGB')
  170. fframe = np.array(image)
  171. fframe = lf.process_image(fframe, False)
  172. frame = LaneFinder.Image.fromarray(fframe)
  173. framecv = cv.cvtColor(np.asarray(frame), cv.COLOR_RGB2BGR)
  174. return framecv
  175. def calculate_position(bbox, transform_matrix, warped_size, pix_per_meter):
  176. if len(bbox) == 0:
  177. print('Nothing')
  178. else:
  179. point = np.array((bbox[1] / 2 + bbox[3] / 2, bbox[2])).reshape(1, 1, -1)
  180. pos = cv.perspectiveTransform(point, transform_matrix).reshape(-1, 1)
  181. return np.array((warped_size[1] - pos[1]) / pix_per_meter[1])
  182. def main():
  183. if (len(sys.argv) != 2):
  184. print("Please input video path")
  185. exit(1)
  186. frame_count = 0
  187. sess = rt.InferenceSession('../model/yolov4_bs.onnx')
  188. #open video
  189. video_path = sys.argv[1]
  190. print("open video ", video_path)
  191. cap = cv.VideoCapture(video_path)
  192. fps = cap.get(cv.CAP_PROP_FPS)
  193. Width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
  194. Height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
  195. lf.set_img_size((Width, Height))
  196. #create output directory
  197. if not os.path.exists(OUTPUT_DIR):
  198. os.mkdir(OUTPUT_DIR)
  199. output_Video = os.path.basename(video_path)
  200. output_Video = os.path.join(OUTPUT_DIR, output_Video)
  201. fourcc = cv.VideoWriter_fourcc(*'mp4v') # DIVX, XVID, MJPG, X264, WMV1, WMV2
  202. outVideo = cv.VideoWriter(output_Video, fourcc, fps, (Width, Height))
  203. # 模型的输入和输出节点名,可以通过netron查看
  204. input_name = 'input'
  205. outputs_name = ['feature_map_1', 'feature_map_2', 'feature_map_3']
  206. # Read until video is completed
  207. while (cap.isOpened()):
  208. ret, frame = cap.read()
  209. if ret == True:
  210. #preprocess
  211. data, orig = preprocess(frame)
  212. result_list = sess.run(outputs_name, {input_name: data})
  213. result_return = post_process(result_list, orig)
  214. frame_with_lane = preprocess_frame(frame)
  215. distance = np.zeros(shape=(len(result_return['detection_classes']), 1))
  216. for i in range(len(result_return['detection_classes'])):
  217. box = result_return['detection_boxes'][i]
  218. class_name = result_return['detection_classes'][i]
  219. # confidence = result_return['detection_scores'][i]
  220. distance[i] = calculate_position(bbox=box, transform_matrix=perspective_transform,
  221. warped_size=WARPED_SIZE, pix_per_meter=pixels_per_meter)
  222. label_dis = '{} {:.2f}m'.format('dis:', distance[i][0])
  223. cv.putText(frame_with_lane, label_dis, (int(box[1]) + 10, int(box[2]) + 15),
  224. cv.FONT_ITALIC, 0.6, colors[i % 6], 1)
  225. cv.rectangle(frame_with_lane, (int(box[1]), int(box[0])), (int(box[3]), int(box[2])), colors[i % 6])
  226. p3 = (max(int(box[1]), 15), max(int(box[0]), 15))
  227. out_label = class_name
  228. cv.putText(frame_with_lane, out_label, p3, cv.FONT_ITALIC, 0.6, colors[i % 6], 1)
  229. outVideo.write(frame_with_lane)
  230. print("FINISH PROCESSING FRAME: ", frame_count)
  231. frame_count += 1
  232. else:
  233. break
  234. cap.release()
  235. outVideo.release()
  236. print("Execute end")
  237. if __name__ == '__main__':
  238. path = './configure.json'
  239. config_file = open(path, "rb")
  240. fileJson = json.load(config_file)
  241. cam_matrix = fileJson[0]["cam_matrix"]
  242. dist_coeffs = fileJson[0]["dist_coeffs"]
  243. perspective_transform = fileJson[0]["perspective_transform"]
  244. pixels_per_meter = fileJson[0]["pixels_per_meter"]
  245. WARPED_SIZE = fileJson[0]["WARPED_SIZE"]
  246. ORIGINAL_SIZE = fileJson[0]["ORIGINAL_SIZE"]
  247. cam_matrix = np.array(cam_matrix)
  248. dist_coeffs = np.array(dist_coeffs)
  249. perspective_transform = np.array(perspective_transform)
  250. pixels_per_meter = tuple(pixels_per_meter)
  251. WARPED_SIZE = tuple(WARPED_SIZE)
  252. ORIGINAL_SIZE = tuple(ORIGINAL_SIZE)
  253. lf = LaneFinder.LaneFinder(ORIGINAL_SIZE, WARPED_SIZE, cam_matrix, dist_coeffs,
  254. perspective_transform, pixels_per_meter)
  255. main()

3.结果视频

4. 结尾

        代码可以去资源下载,就这样吧

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/724052
推荐阅读
相关标签
  

闽ICP备14008679号