本案例演示在 OpenVINO 中使用 MidasNet 进行单目深度估计,输入图片情况。模型信息可以在 这里找到。
我们知道,就算我们闭上一只眼,也可以对眼前物体的距离有一个判断。 那也就是说,我们可以通过深度学习,希望机器能拥有像人脑一样的学习能力,2D图像的距离信息有一个估算。
在这个演示中,我们使用了一个名为MiDaS 的神经网络模型。论文出处:
R. Ranftl, K. Lasinger, D. Hafner, K. Schindler and V. Koltun, “Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019967.
import sys import time from pathlib import Path import cv2 import matplotlib.cm import matplotlib.pyplot as plt import numpy as np from IPython.display import ( HTML, FileLink, Pretty, ProgressBar, Video, clear_output, display, ) from openvino.runtime import Core DEVICE = "CPU" MODEL_FILE = "model/MiDaS_small.xml" model_xml_path = Path(MODEL_FILE) def normalize_minmax(data): """ Normalizes the values in `data` between 0 and 1 """ return (data - data.min()) / (data.max() - data.min()) def convert_result_to_image(result, colormap="viridis"): """ Convert network result of floating point numbers to an RGB image with integer values from 0-255 by applying a colormap. `result` is expected to be a single network result in 1,H,W shape `colormap` is a matplotlib colormap. See https://matplotlib.org/stable/tutorials/colors/colormaps.html """ cmap = matplotlib.cm.get_cmap(colormap) result = result.squeeze(0) result = normalize_minmax(result) result = cmap(result)[:, :, :3] * 255 result = result.astype(np.uint8) return result def to_rgb(image_data) -> np.ndarray: """ Convert image_data from BGR to RGB """ return cv2.cvtColor(image_data, cv2.COLOR_BGR2RGB) print("1 - Load Model") ie = Core() model = ie.read_model(model=model_xml_path, weights=model_xml_path.with_suffix(".bin")) compiled_model = ie.compile_model(model=model, device_name=DEVICE) input_key = compiled_model.input(0) output_key = compiled_model.output(0) print("- Input layer info: {}".format(input_key)) print("- Output layer info: {}".format(output_key)) network_input_shape = list(input_key.shape) network_image_height, network_image_width = network_input_shape[2:] print("2 - Load Image") IMAGE_FILE = "data/coco_bike.jpg" image = cv2.imread(IMAGE_FILE) print("- Input image size: {}".format(image.shape)) # resize to input shape for network resized_image = cv2.resize(src=image, dsize=(network_image_height, network_image_width)) # reshape image to network input shape NCHW input_image = np.expand_dims(np.transpose(resized_image, (2, 0, 1)), 0) print("- Image resize into: {}".format(input_image.shape)) print("3 - Model Inference") result = compiled_model([input_image])[output_key] print("- Inference result shape: {}".format(result.shape)) print("- convert network result of disparity map to an image that shows distance as colors.") result_image = convert_result_to_image(result) # resize back to original image shape. cv2.resize expects shape # in (width, height), [::-1] reverses the (height, width) shape to match this result_image = cv2.resize(result_image, image.shape[:2][::-1]) print("- resize back to original image shape from (width, height) to (height, width) based on cv2.resize requirement with final image shape {}".format(result_image.shape)) print("- final results visualization.") fig, ax = plt.subplots(1, 2, figsize=(20, 15)) ax[0].imshow(to_rgb(image)) ax[1].imshow(result_image)
1 - Load Model
- Input layer info: <ConstOutput: names[input.1] shape{1,3,256,256} type: f32>
- Output layer info: <ConstOutput: names[1349] shape{1,256,256} type: f32>
2 - Load Image
- Input image size: (600, 800, 3)
- Image resize into: (1, 3, 256, 256)
3 - Model Inference
- Inference result shape: (1, 256, 256)
- convert network result of disparity map to an image that shows distance as colors.
- resize back to original image shape from (width, height) to (height, width) based on cv2.resize requirement with final image shape (600, 800, 3)
- final results visualization.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。