赞
踩
导入相关的包
- import os
- from matplotlib.pyplot import imshow
- import numpy as np
- import imageio
- import tensorflow as tf
- from keras import backend as K
- from keras.models import load_model
- from yolo_utils import read_classes, read_anchors, generate_colors, preprocess_image, draw_boxes, scale_boxes
- from yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners
你正在研究自动驾驶汽车。作为关键的一部分,你想要建立一个汽车检测系统。为了手机数据,你在汽车前面装了一个摄像头,可以每隔几秒就采集前方道路上的照片。
现在你收集并标注了数据,利用方框以及坐标等将汽车标记起来,如下图所示:
如果你有80个类别需要YOLO识别,你可以用一个label c来表示,c的值是1-80,也可以用一个80维的向量来表示,每个维度的值0表示未识别到,1表示识别到。
在课程中我们使用了后者向量表示法。而在此次作业中根据具体场景哪种方便用哪种,两种都有使用。
YOLO (“you only look once”) 是一个流行的算法,在实际运行中可以获得较高的准确率。算法只需要一次前向传播来做出预测。在非最大抑制之后,用方彪标识出识别的对象。
输入一组图片:(m, 608, 608, 3)
输出四一组识别对象上的标识方框。每个方框标识6个数 (pc,bx,by,bh,bw,c)。这里c为1-80,如果你想要用向量表示,则输出的方框表示85个数。
我们将使用5种 anchor boxex, 所以YOLO结构可以认为是:IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85)
下图展示了结果编码表示的更多细节
如果对象的中心落入一个方格,则这个方格负责识别此对象。
由于有5种 anchor boxes, 每个19x19的单元格都包含了5个boxes的编码信息。Anchor boxes 只定义了宽和高。
简化一些,我们展开(19, 19, 5, 85)的最后两个维度,则输出为(19,19,425)
现在,对于每个单元格的每个anchor box, 计算一下按元素乘积然后得出该box包含特定类的可能性分数。
这里是一种 YOLO 模型预测结果的形象表示方式
对每一个19*19的单元格,找出最大的可能性分数(对每个分类的每个ancher box都找出最大分数)
根据最可能出现的类对图片单元格进行染色。
如下图所示:
注意:图像染色和可视化并不是YOLO算法预测的核心,只是一个展示算法中间结果的友好方式。
另外一种展示YOLO输出的方式是用方框标记识别,不同的颜色表示不同的分类,不同的形状表示不同的ancher。
上图我们只标识出了得分相对较高的boxes, 其实还有很多boxes。过滤出高分box的方法是“非最大值抑制”
选出低分boxes (对是否识别一个种类不是很自信)
从相互重叠并且是识别的同一个对象的boxes中选择分数最高的一个。
去掉分值低于门槛的box
模型给出了(19x19x5x85)个数(假设用80个数表示80个分类),很容易进行拆分转换:
box_confidence: (19×19,5,1) 表示Pc, 每个anchor预测到有对象的分数
boxes: (19×19,5,4) 表示方框(bx,by,bh,bw)
box_class_probs: (19×19,5,80) 是哪个类 (c1,c2,…c80)
练习:实现 yolo_filter_boxes()
计算Pc与classes的对应乘积,得到分数
- a = np.random.randn(19*19, 5, 1)
- b = np.random.randn(19*19, 5, 80)
- c = a * b # shape of c will be (19*19, 5, 80)
1. 对每个box
1. 找出最高分的分类(80选1)
2. 得出相应的分数
2. 创建一个门槛mask:比如 ([0.9, 0.3, 0.4, 0.5, 0.1] < 0.4) 返回 [False, True, False, False, True] 注意你想保留的boxes应该为true
3. 利用 TensorFlow 将 mask 应用到 box_class_scores 上,过滤掉不需要的boxes。
- # GRADED FUNCTION: yolo_filter_boxes
-
- def yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = .6):
- """Filters YOLO boxes by thresholding on object and class confidence.
- Arguments:
- box_confidence -- tensor of shape (19, 19, 5, 1)
- boxes -- tensor of shape (19, 19, 5, 4)
- box_class_probs -- tensor of shape (19, 19, 5, 80)
- threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
- Returns:
- scores -- tensor of shape (None,), containing the class probability score for selected boxes
- boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes
- classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes
- Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold.
- For example, the actual output size of scores would be (10,) if there are 10 boxes.
- """
-
- # Step 1: Compute box scores
- ### START CODE HERE ### (≈ 1 line)
- box_scores = box_confidence * box_class_probs
- ### END CODE HERE ###
-
- # Step 2: Find the box_classes thanks to the max box_scores, keep track of the corresponding score
- ### START CODE HERE ### (≈ 2 lines)
- box_classes = K.argmax(box_scores, axis=-1)
- box_class_scores = K.max(box_scores, axis=-1, keepdims=False)
- ### END CODE HERE ###
-
- # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the
- # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
- ### START CODE HERE ### (≈ 1 line)
- filtering_mask = box_class_scores >= threshold
- ### END CODE HERE ###
-
- # Step 4: Apply the mask to scores, boxes and classes
- ### START CODE HERE ### (≈ 3 lines)
- scores = tf.boolean_mask(box_class_scores, filtering_mask)
- boxes = tf.boolean_mask(boxes, filtering_mask)
- classes = tf.boolean_mask(box_classes, filtering_mask)
- ### END CODE HERE ###
-
- return scores, boxes, classes
- #测试yolo_filter_boxes
- with tf.Session() as test_a:
- box_confidence = tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1)
- boxes = tf.random_normal([19, 19, 5, 4], mean=1, stddev=4, seed = 1)
- box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1)
- scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)
- print("scores[2] = " + str(scores[2].eval()))
- print("boxes[2] = " + str(boxes[2].eval()))
- print("classes[2] = " + str(classes[2].eval()))
- print("scores.shape = " + str(scores.shape))
- print("boxes.shape = " + str(boxes.shape))
- print("classes.shape = " + str(classes.shape))
期望输出
scores[2] = 10.7506
boxes[2] = [ 8.42653275 3.27136683 -0.5313437 -4.94137383]
classes[2] = 7
scores.shape = (?,)
boxes.shape = (?, 4)
classes.shape = (?,)
经过门槛过滤,你仍然有很多重叠的boxes, 第二个过滤器将从重叠的里面选出正确的box,这个方法叫做非最大抑制(NMS)
非最大抑制算法用到一个很重要的方法:交并比(Intersection over Union, IoU)
练习:实现iou()
在这个练习中(仅在这里), 我们使用两角坐标(左上角/右下角)而不是中心和宽高来表示一个box
计算box面积的方法 (y2 - y1)x(x2 - x1)
你还需要找到相交部分的坐标(xi1, yi1, xi2, yi2)
xi1 = max(两个方框的x1)
yi1 = max(两个方框的y1)
xi2 = min(两个方框的x2)
yi2 = min(两个方框的y2)
在下面代码中,我们约定box的左上角(0,0), 右下角(1,1)
- # GRADED FUNCTION: iou
-
- def iou(box1, box2):
- """Implement the intersection over union (IoU) between box1 and box2
- Arguments:
- box1 -- first box, list object with coordinates (x1, y1, x2, y2)
- box2 -- second box, list object with coordinates (x1, y1, x2, y2)
- """
-
- # Calculate the (y1, x1, y2, x2) coordinates of the intersection of box1 and box2. Calculate its Area.
- ### START CODE HERE ### (≈ 5 lines)
- xi1 = max(box1[0], box2[0])
- yi1 = max(box1[1], box2[1])
- xi2 = min(box1[2], box2[2])
- yi2 = min(box1[3], box2[3])
- inter_area = (xi2 - xi1) * (yi2 - yi1)
- ### END CODE HERE ###
-
- # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
- ### START CODE HERE ### (≈ 3 lines)
- box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1])
- box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1])
- union_area = box1_area + box2_area - inter_area
- ### END CODE HERE ###
-
- # compute the IoU
- ### START CODE HERE ### (≈ 1 line)
- iou = inter_area / union_area
- ### END CODE HERE ###
-
- return iou
- #测试iou
- box1 = (2, 1, 4, 3)
- box2 = (1, 2, 3, 4)
- print("iou = " + str(iou(box1, box2)))
期望输出
iou = 0.14285714285714285
现在你准备好实现非最大抑制了。关键步骤为:
1. 选出具有最高分数的box
2. 计算该box和其他box的iou, 删除重叠部分iou大于 iou_threshold 的 box
3. 循环1,2 直到没有满足条件的 boxes
这样将会删除所有有大量重叠覆盖的的 boxes,只留下最优的。
练习:使用 TensorFlow 实现 yolo_non_max_suppression()
TensorFlow有用的方法:
tf.image.non_max_suppression() # 不需要用你自己的 iou 方法了
K.gather()
- # GRADED FUNCTION: yolo_non_max_suppression
-
- def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
- """
- Applies Non-max suppression (NMS) to set of boxes
- Arguments:
- scores -- tensor of shape (None,), output of yolo_filter_boxes()
- boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later)
- classes -- tensor of shape (None,), output of yolo_filter_boxes()
- max_boxes -- integer, maximum number of predicted boxes you'd like
- iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
- Returns:
- scores -- tensor of shape (, None), predicted score for each box
- boxes -- tensor of shape (4, None), predicted box coordinates
- classes -- tensor of shape (, None), predicted class for each box
- Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
- function will transpose the shapes of scores, boxes, classes. This is made for convenience.
- """
-
- max_boxes_tensor = K.variable(max_boxes, dtype='int32') # tensor to be used in tf.image.non_max_suppression()
- K.get_session().run(tf.variables_initializer([max_boxes_tensor])) # initialize variable max_boxes_tensor
-
- # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
- ### START CODE HERE ### (≈ 1 line)
- nms_indices = tf.image.non_max_suppression(boxes, scores, max_boxes, iou_threshold, name=None)
- ### END CODE HERE ###
-
- # Use K.gather() to select only nms_indices from scores, boxes and classes
- ### START CODE HERE ### (≈ 3 lines)
- scores = K.gather(scores, nms_indices)
- boxes = K.gather(boxes, nms_indices)
- classes = K.gather(classes, nms_indices)
- ### END CODE HERE ###
-
- return scores, boxes, classes
- #测试yolo_non_max_suppression
-
- with tf.Session() as test_b:
- scores = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
- boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed = 1)
- classes = tf.random_normal([54,], mean=1, stddev=4, seed = 1)
- scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)
- print("scores[2] = " + str(scores[2].eval()))
- print("boxes[2] = " + str(boxes[2].eval()))
- print("classes[2] = " + str(classes[2].eval()))
- print("scores.shape = " + str(scores.eval().shape))
- print("boxes.shape = " + str(boxes.eval().shape))
- print("classes.shape = " + str(classes.eval().shape))
期望输出
scores[2] = 6.9384
boxes[2] = [-5.299932 3.13798141 4.45036697 0.95942086]
classes[2] = -2.24527
scores.shape = (10,)
boxes.shape = (10, 4)
classes.shape = (10,)
接下来我们需要实现深度卷积神经网络(CNN)(19x19x5x85)
练习:实现 yolo_eval()
yolo_eval 方法将YOLO 的输出进行编码并用非最大抑制进行过滤。
表示 box 的方式由好多种,比如左上角/右下角的坐标,比如中心和宽高。YOLO 在运算过程中将灵活转换这些表示方式。
# (x,y,w,h) --> (x1, y1, x2, y2)
# 用于符合yolo_filter_boxes的输入
boxes = yolo_boxes_to_corners(box_xy, box_wh)
# 格局图片大小调整 box 大小
boxes = scale_boxes(boxes, image_shape)
- # GRADED FUNCTION: yolo_eval
-
- def yolo_eval(yolo_outputs, image_shape = (720., 1280.), max_boxes=10, score_threshold=.6, iou_threshold=.5):
- """
- Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.
- Arguments:
- yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
- box_confidence: tensor of shape (None, 19, 19, 5, 1)
- box_xy: tensor of shape (None, 19, 19, 5, 2)
- box_wh: tensor of shape (None, 19, 19, 5, 2)
- box_class_probs: tensor of shape (None, 19, 19, 5, 80)
- image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype)
- max_boxes -- integer, maximum number of predicted boxes you'd like
- score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box
- iou_threshold -- real value, "intersection over union" threshold used for NMS filtering
- Returns:
- scores -- tensor of shape (None, ), predicted score for each box
- boxes -- tensor of shape (None, 4), predicted box coordinates
- classes -- tensor of shape (None,), predicted class for each box
- """
-
- ### START CODE HERE ###
-
- # Retrieve outputs of the YOLO model (≈1 line)
- box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs
-
- # Convert boxes to be ready for filtering functions
- boxes = yolo_boxes_to_corners(box_xy, box_wh)
-
- # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
- scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)
-
- # Scale boxes back to original image shape.
- boxes = scale_boxes(boxes, image_shape)
-
- # Use one of the functions you've implemented to perform Non-max suppression with a threshold of iou_threshold (≈1 line)
- scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)
-
- ### END CODE HERE ###
-
- return scores, boxes, classes
- #测试yolo_eval
-
- with tf.Session() as test_b:
- yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),
- tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
- tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),
- tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))
- scores, boxes, classes = yolo_eval(yolo_outputs)
- print("scores[2] = " + str(scores[2].eval()))
- print("boxes[2] = " + str(boxes[2].eval()))
- print("classes[2] = " + str(classes[2].eval()))
- print("scores.shape = " + str(scores.eval().shape))
- print("boxes.shape = " + str(boxes.eval().shape))
- print("classes.shape = " + str(classes.eval().shape))
期望输出
scores[2] = 138.791
boxes[2] = [ 1292.32971191 -278.52166748 3876.98925781 -835.56494141]
classes[2] = 54
scores.shape = (10,)
boxes.shape = (10, 4)
classes.shape = (10,)
YOLO 的总结
输入图片(608, 608, 3)
输入的图片经过一个 CNN,得到一个输出(19,19,5,85)
展开图片的后两个维度,得到 (19, 19, 425)
19x19 中的每个单元格都包含了图片的425个数
425 = 5 x 85 因为每个单元格包含5个预测 boxes, 对于5个 anchor boxes
85 = 5 + 80 其中5表示(pc,bx,by,bh,bw),80代表要检测的分类数
然后基于以下规则挑选一些 boxes
分值门槛:扔掉预测值低于门槛的 boxes
非最大抑制:计算 iou,避免重叠的同一个对象识别
给出 YOLO 的最后输出
创建session
sess = K.get_session()
classes和anchers文件是分开的,另外原始文件是(720, 1280)的,我们可以处理成(608, 608)
- class_names = read_classes("model_data/coco_classes.txt")
- anchors = read_anchors("model_data/yolo_anchors.txt")
- image_shape = (720., 1280.)
模型来自the official YOLO website的文件——yolo.h5。
注意利用前文程序将图片(m, 608, 608, 3) 转换为 (m, 19, 19, 5, 85)
- yolo_model = load_model("model_data/yolov2.h5")
- yolo_model.summary()
yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))
接下来将yolo_ouput 传给模型的 yolo_eval
yolo_ouput 已经将输出的格式调整好了,调用前文程序 yolo_eval 选出最好的boxes
scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)
步骤:
1. 创建session
2. yolo_model.input 给到 yolo_model 计算输出 yolo_model.output
3. yolo_model.output 给到 yolo_head,转换为 yolo_output
4. yolo_output 经过过滤-yolo_eval,输出预测的接轨:scores, boxes, classes
练习:实现模型预测方法 yolo_predict
提示方法:
image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))
方法输出:
image: 用于在图片上画出 boxes 的 PIL 表示,这里你不需要用它
image_data: 一个 numpy-array 表示的图片,经作为 CNN 的输入
当模型使用 BatchNorm 时,`feed_dict {K.learning_phase(): 0} `中需要多一个占位符 placeholder
- def predict(sess, image_file):
- """
- Runs the graph stored in "sess" to predict boxes for "image_file". Prints and plots the preditions.
- Arguments:
- sess -- your tensorflow/Keras session containing the YOLO graph
- image_file -- name of an image stored in the "images" folder.
- Returns:
- out_scores -- tensor of shape (None, ), scores of the predicted boxes
- out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
- out_classes -- tensor of shape (None, ), class index of the predicted boxes
- Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes.
- """
-
- # Preprocess your image
- image, image_data = preprocess_image("./images/" + image_file, model_image_size = (608, 608))
-
- # Run the session with the correct tensors and choose the correct placeholders in the feed_dict.
- # You'll need to use feed_dict={yolo_model.input: ... , K.learning_phase(): 0})
- ### START CODE HERE ### (≈ 1 line)
- out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})
- ### END CODE HERE ###
-
- # Print predictions info
- print('Found {} boxes for {}'.format(len(out_boxes), image_file))
- # Generate colors for drawing bounding boxes.
- colors = generate_colors(class_names)
- # Draw bounding boxes on the image file
- draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
- # Save the predicted bounding box on the image
- image.save(os.path.join("./out", image_file), quality=90)
- # Display the results in the notebook
- #output_image = scipy.misc.imread(os.path.join("./out", image_file))
- output_image = imageio.imread(os.path.join("./out", image_file))
- imshow(output_image)
-
- return out_scores, out_boxes, out_classes
- # 在 test.jpg 上进行测试
- out_scores, out_boxes, out_classes = predict(sess, "test2.jpg")
期望输出
Found 7 boxes for test.jpg
car 0.60 (925, 285) (1045, 374)
car 0.66 (706, 279) (786, 350)
bus 0.67 (5, 266) (220, 407)
car 0.70 (947, 324) (1280, 705)
car 0.74 (159, 303) (346, 440)
car 0.80 (761, 282) (942, 412)
car 0.89 (367, 300) (745, 648)
刚才运行的模型可以识别 coco_classes.txt 列出的 80 个种类,你可以自己试一下。
谨记
YOLO 是一个高水平的检测模型,迅速又准确
输入图片通过 CNN 输出 19x19x5x85 的维度
可以认为 19x19 中的每个单元格都包含 5 个 boxes 的信息
过滤器使用非最大抑制进行过滤
门槛过滤器过滤掉低分的识别,只留下高分的识别
利用IOU门槛识别消除重叠的boxes
python 3.6 + tensorflow 1.15环境
【免费】python3.6+tensorflow1.15资源-CSDN文库
实验四 资源
作者:henu 数据科学与大数据技术 空午
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。